Orgnaization & cleanup
This commit is contained in:
+107
-92
@@ -15,10 +15,10 @@ modes dispatched by a push constant:
|
||||
shader premultiplies the texture sample (`t.rgb *= t.a`) and computes `out = color * t`.
|
||||
|
||||
- **Mode 1 (SDF):** A static 6-vertex unit-quad buffer is drawn instanced, with per-primitive
|
||||
`Primitive` structs (80 bytes each) uploaded each frame to a GPU storage buffer. The vertex shader
|
||||
reads `primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners +
|
||||
`Base_2D_Primitive` structs (96 bytes each) uploaded each frame to a GPU storage buffer. The vertex
|
||||
shader reads `primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners +
|
||||
primitive bounds. The fragment shader dispatches on `Shape_Kind` (encoded in the low byte of
|
||||
`Primitive.flags`) to evaluate one of four signed distance functions:
|
||||
`Base_2D_Primitive.flags`) to evaluate one of four signed distance functions:
|
||||
- **RRect** (kind 1) — `sdRoundedBox` with per-corner radii. Covers rectangles (sharp or rounded),
|
||||
circles (uniform radii = half-size), and line segments / capsules (rotated RRect with uniform
|
||||
radii = half-thickness). Covers filled, outlined, textured, and gradient-filled variants.
|
||||
@@ -28,21 +28,22 @@ modes dispatched by a push constant:
|
||||
normals. Covers full rings, partial arcs, and pie slices (`inner_radius = 0`).
|
||||
|
||||
All SDF shapes support fill, outline, solid color, 2-color linear gradients, 2-color radial
|
||||
gradients, and texture fills via `Shape_Flags` (see `pipeline_2d_base.odin`). Gradient and outline
|
||||
parameters are packed into the same 16 bytes as the texture UV rect via a `Uv_Or_Effects` raw union
|
||||
— zero size increase to the 80-byte `Primitive` struct. Gradient/outline and texture are mutually
|
||||
exclusive.
|
||||
gradients, and texture fills via `Shape_Flags` (see `pipeline_2d_base.odin`). The texture UV rect
|
||||
(`uv_rect: [4]f32`) and the gradient/outline parameters (`effects: Gradient_Outline`) live in their
|
||||
own 16-byte slots in `Base_2D_Primitive`, so a primitive can carry texture and outline simultaneously.
|
||||
Gradient and texture remain mutually exclusive at the fill-source level (a Brush variant chooses one
|
||||
or the other) since they share the worst-case fragment-shader register path.
|
||||
|
||||
All SDF shapes produce mathematically exact curves with analytical anti-aliasing via `smoothstep` —
|
||||
no tessellation, no piecewise-linear approximation. A rounded rectangle is 1 primitive (80 bytes)
|
||||
no tessellation, no piecewise-linear approximation. A rounded rectangle is 1 primitive (96 bytes)
|
||||
instead of ~250 vertices (~5000 bytes).
|
||||
|
||||
The main pipeline's register budget is **≤24 registers** (see "Main/effects split: register pressure"
|
||||
in the pipeline plan below for the full cliff/margin analysis and SBC architecture context). The
|
||||
fragment shader's estimated peak footprint is ~22–26 fp32 VGPRs (~16–22 fp16 VGPRs on architectures
|
||||
in the pipeline plan below for the full cliff/margin analysis and SBC architecture context).
|
||||
The fragment shader's estimated peak footprint is ~22–26 fp32 VGPRs (~16–22 fp16 VGPRs on architectures
|
||||
with native mediump) via manual live-range analysis. The dominant peak is the Ring_Arc kind path
|
||||
(wedge normals + inner/outer radii + dot-product temporaries live simultaneously with carried state
|
||||
like `f_color`, `f_uv_or_effects`, and `half_size`). RRect is 1–2 regs lower (`corner_radii` vec4
|
||||
like `f_color`, `f_uv_rect`/`f_effects`, and `half_size`). RRect is 1–2 regs lower (`corner_radii` vec4
|
||||
replaces the separate inner/outer + normal pairs). NGon and Ellipse are lighter still. Real compilers
|
||||
apply live-range coalescing, mediump-to-fp16 promotion, and rematerialization that typically shave
|
||||
2–4 regs from hand-counted estimates — the conservative 26-reg upper bound is expected to compile
|
||||
@@ -439,12 +440,13 @@ vertex shader branches on this uniform to select the tessellated or SDF code pat
|
||||
- **Tessellated mode** (`mode = 0`): direct vertex buffer with explicit geometry. Used for text
|
||||
(SDL_ttf atlas sampling), triangles, triangle fans/strips, single-pixel points, and any
|
||||
user-provided raw vertex geometry.
|
||||
- **SDF mode** (`mode = 1`): shared unit-quad vertex buffer + GPU storage buffer of `Primitive`
|
||||
structs, drawn instanced. Used for all shapes with closed-form signed distance functions.
|
||||
- **SDF mode** (`mode = 1`): shared unit-quad vertex buffer + GPU storage buffer of
|
||||
`Base_2D_Primitive` structs, drawn instanced. Used for all shapes with closed-form signed distance
|
||||
functions.
|
||||
|
||||
Both modes use the same fragment shader. The fragment shader checks `Shape_Kind` (low byte of
|
||||
`Primitive.flags`): kind 0 (`Solid`) is the tessellated path, which premultiplies the texture sample
|
||||
and computes `out = color * t`; kinds 1–4 dispatch to one of four SDF functions (RRect, NGon,
|
||||
`Base_2D_Primitive.flags`): kind 0 (`Solid`) is the tessellated path, which premultiplies the texture
|
||||
sample and computes `out = color * t`; kinds 1–4 dispatch to one of four SDF functions (RRect, NGon,
|
||||
Ellipse, Ring_Arc) and apply gradient/texture/outline/solid color based on `Shape_Flags` bits.
|
||||
|
||||
#### Why SDF for shapes
|
||||
@@ -452,8 +454,8 @@ Ellipse, Ring_Arc) and apply gradient/texture/outline/solid color based on `Shap
|
||||
CPU-side adaptive tessellation for curved shapes (the current approach) has three problems:
|
||||
|
||||
1. **Vertex bandwidth.** A rounded rectangle with four corner arcs produces ~250 vertices × 20 bytes
|
||||
= 5 KB. An SDF rounded rectangle is one `Primitive` struct (~56 bytes) plus 4 shared unit-quad
|
||||
vertices. That is roughly a 90× reduction per shape.
|
||||
= 5 KB. An SDF rounded rectangle is one `Base_2D_Primitive` struct (96 bytes) plus 4 shared
|
||||
unit-quad vertices. That is roughly a 50× reduction per shape.
|
||||
|
||||
2. **Quality.** Tessellated curves are piecewise-linear approximations. At high DPI or under
|
||||
animation/zoom, faceting is visible at any practical segment count. SDF evaluation produces
|
||||
@@ -484,14 +486,14 @@ SDF primitives are submitted via a GPU storage buffer indexed by `gl_InstanceInd
|
||||
shader, rather than encoding per-primitive data redundantly in vertex attributes. This follows the
|
||||
pattern used by both Zed GPUI and vger-rs.
|
||||
|
||||
Each SDF shape is described by a single `Primitive` struct (80 bytes) in the storage buffer. The
|
||||
vertex shader reads `primitives[gl_InstanceIndex]`, computes the quad corner position from the unit
|
||||
vertex and the primitive's bounds, and passes shape parameters to the fragment shader via `flat`
|
||||
interpolated varyings.
|
||||
Each SDF shape is described by a single `Base_2D_Primitive` struct (96 bytes) in the storage
|
||||
buffer. The vertex shader reads `primitives[gl_InstanceIndex]`, computes the quad corner position
|
||||
from the unit vertex and the primitive's bounds, and passes shape parameters to the fragment shader
|
||||
via `flat` interpolated varyings.
|
||||
|
||||
Compared to encoding per-primitive data in vertex attributes (the "fat vertex" approach), storage-
|
||||
buffer instancing eliminates the 4–6× data duplication across quad corners. A rounded rectangle costs
|
||||
80 bytes instead of 4 vertices × 40+ bytes = 160+ bytes.
|
||||
96 bytes instead of 4 vertices × 60+ bytes = 240+ bytes.
|
||||
|
||||
The tessellated path retains the existing direct vertex buffer layout (20 bytes/vertex, no storage
|
||||
buffer access). The vertex shader branch on `mode` (push constant) is warp-uniform — every invocation
|
||||
@@ -499,15 +501,18 @@ in a draw call has the same mode — so it is effectively free on all modern GPU
|
||||
|
||||
#### Shape kinds and SDF dispatch
|
||||
|
||||
The fragment shader dispatches on `Shape_Kind` (low byte of `Primitive.flags`) to evaluate one of
|
||||
four signed distance functions. The `Shape_Kind` enum and per-kind `*_Params` structs are defined in
|
||||
`pipeline_2d_base.odin`. CPU-side drawing procs in `shapes.odin` build the appropriate `Primitive`
|
||||
and set the kind automatically:
|
||||
The fragment shader dispatches on `Shape_Kind` (low byte of `Base_2D_Primitive.flags`) to evaluate
|
||||
one of four signed distance functions. The `Shape_Kind` enum and per-kind `*_Params` structs are
|
||||
defined in `pipeline_2d_base.odin`. CPU-side drawing procs in `shapes.odin` build the appropriate
|
||||
`Base_2D_Primitive` and set the kind automatically:
|
||||
|
||||
Each user-facing shape proc accepts a `Brush` union (color, linear gradient, radial gradient,
|
||||
or textured fill) as its fill source, plus optional outline parameters. The procs map to SDF
|
||||
kinds as follows:
|
||||
|
||||
| User-facing proc | Shape_Kind | SDF function | Notes |
|
||||
| -------------------- | ---------- | ------------------ | ---------------------------------------------------------- |
|
||||
| `rectangle` | `RRect` | `sdRoundedBox` | Per-corner radii from `radii` param |
|
||||
| `rectangle_texture` | `RRect` | `sdRoundedBox` | Textured fill; `.Textured` flag set |
|
||||
| `circle` | `RRect` | `sdRoundedBox` | Uniform radii = half-size (circle is a degenerate RRect) |
|
||||
| `line`, `line_strip` | `RRect` | `sdRoundedBox` | Rotated capsule — stadium shape (radii = half-thickness) |
|
||||
| `ellipse` | `Ellipse` | `sdEllipseApprox` | Approximate ellipse SDF (fast, suitable for UI) |
|
||||
@@ -599,20 +604,21 @@ to is a hard GPU constraint; the only way to satisfy it is to end the current re
|
||||
a new one. That render-pass boundary is what a “bracket” is.
|
||||
|
||||
**Multi-pass implementation.** Backdrop effects are implemented as separable multi-pass sequences
|
||||
(downsample → horizontal blur → vertical-blur+composite), following the standard approach used by
|
||||
iOS `UIVisualEffectView`, Android `RenderEffect`, and Flutter's `BackdropFilter`. Each individual
|
||||
(downsample → horizontal blur → vertical blur → composite), following the standard approach used
|
||||
by iOS `UIVisualEffectView`, Android `RenderEffect`, and Flutter's `BackdropFilter`. Each individual
|
||||
sub-pass is budgeted at **≤24 registers** (same as the main pipeline — full Valhall occupancy). The
|
||||
multi-pass approach avoids the monolithic 70+ register shader that a single-pass Gaussian blur would
|
||||
require, keeping each sub-pass well under the 32-register cliff.
|
||||
|
||||
**Approach B: render-target choice.** When any layer in the frame contains a backdrop draw, the
|
||||
entire frame renders into `source_texture` (a full-resolution single-sample texture owned by the
|
||||
backdrop pipeline) instead of directly into the swapchain. At the end of the frame, `source_texture`
|
||||
is copied to the swapchain via a single `CopyGPUTextureToTexture` call. This means the bracket has
|
||||
no mid-frame texture copy: by the time the bracket runs, `source_texture` already contains the pre-
|
||||
bracket frame contents and is the natural sampler input. When no layer in the frame has a backdrop
|
||||
draw, the existing fast path runs: the frame renders directly to the swapchain and the backdrop
|
||||
pipeline's working textures are never touched. Zero cost for backdrop-free frames.
|
||||
**Render-target choice.** When any layer in the frame contains a backdrop draw, the entire
|
||||
frame renders into `source_texture` (a full-resolution single-sample texture owned by the
|
||||
backdrop pipeline) instead of directly into the swapchain. At the end of the frame,
|
||||
`source_texture` is copied to the swapchain via a single `CopyGPUTextureToTexture` call.
|
||||
This means the bracket has no mid-frame texture copy: by the time the bracket runs,
|
||||
`source_texture` already contains the pre-bracket frame contents and is the natural sampler
|
||||
input. When no layer in the frame has a backdrop draw, the existing fast path runs: the frame
|
||||
renders directly to the swapchain and the backdrop pipeline's working textures are never
|
||||
touched. Zero cost for backdrop-free frames.
|
||||
|
||||
**Why not split the backdrop sub-passes into separate pipelines?** Each sub-pass is budgeted at ≤24
|
||||
registers, well under Valhall's 32-register cliff, so there is no occupancy motivation for splitting.
|
||||
@@ -638,13 +644,20 @@ submission order. Concretely, a layer with one or more backdrops splits into thr
|
||||
range. If the layer has no backdrops, none of this kicks in and the layer renders in a single render
|
||||
pass via the existing fast path.
|
||||
|
||||
The downsample runs once per layer, not once per sigma: it just copies `source_texture` to a ¼-
|
||||
resolution working texture and doesn't depend on the kernel. Each unique sigma in the layer triggers
|
||||
one H-blur (reads `downsample_texture`, writes `h_blur_texture`) and one V-composite (reads
|
||||
`h_blur_texture`, writes `source_texture` per-primitive with the SDF mask). Sub-batch coalescing in
|
||||
`append_or_extend_sub_batch` merges contiguous same-sigma backdrops into a single instanced V-
|
||||
composite draw call; non-contiguous same-sigma backdrops still share the H-blur output but issue
|
||||
separate V-composite draws.
|
||||
Per-sigma-group execution. The bracket walks each layer's sub-batches and groups contiguous
|
||||
`.Backdrop` sub-batches that share a sigma; each group picks its own downsample factor (1, 2, or 4)
|
||||
based on `compute_backdrop_downsample_factor`. For each group it runs four sub-passes: a downsample
|
||||
from `source_texture` to `downsample_texture`; an H-blur from `downsample_texture` to
|
||||
`h_blur_texture`; a V-blur from `h_blur_texture` back into `downsample_texture` (ping-pong reuse);
|
||||
and finally a composite that reads the fully-blurred `downsample_texture`, applies the SDF mask
|
||||
and tint, and writes the result to `source_texture`. Sub-batch coalescing in
|
||||
`append_or_extend_sub_batch` merges contiguous same-sigma backdrops into a single instanced
|
||||
composite draw; non-contiguous same-sigma backdrops still share the blur output but issue separate
|
||||
composite draws.
|
||||
|
||||
The working textures are sized at the full swapchain resolution; larger downsample factors only
|
||||
fill a sub-rect via viewport-limited rendering (see the comment block at the top of `backdrop.odin`
|
||||
for the factor-selection table and rationale).
|
||||
|
||||
#### Submission-order trade-off
|
||||
|
||||
@@ -654,12 +667,12 @@ layer. A non-backdrop sub-batch submitted between two backdrops still renders in
|
||||
bracket), not at its submission position. Worked example:
|
||||
|
||||
```
|
||||
draw.rectangle(layer, bg, GRAY) // 0 Tessellated → Pass A
|
||||
draw.rectangle(layer, card_blue, BLUE) // 1 SDF → Pass A
|
||||
draw.rectangle_backdrop(layer, panelA, 12) // 2 Backdrop → Bracket (sees: bg + blue card)
|
||||
draw.rectangle(layer, card_red, RED) // 3 SDF → Pass B (drawn ON TOP of panelA)
|
||||
draw.rectangle_backdrop(layer, panelB, 12) // 4 Backdrop → Bracket (sees: bg + blue card; same as panelA)
|
||||
draw.text(layer, "label", ...) // 5 Text → Pass B (drawn ON TOP of both panels)
|
||||
draw.rectangle(layer, bg, GRAY) // 0 Tessellated → Pass A
|
||||
draw.rectangle(layer, card_blue, BLUE) // 1 SDF → Pass A
|
||||
draw.gaussian_blur(layer, panelA, sigma=12) // 2 Backdrop → Bracket (sees: bg + blue card)
|
||||
draw.rectangle(layer, card_red, RED) // 3 SDF → Pass B (drawn ON TOP of panelA)
|
||||
draw.gaussian_blur(layer, panelB, sigma=12) // 4 Backdrop → Bracket (sees: bg + blue card; same as panelA)
|
||||
draw.text(layer, "label", ...) // 5 Text → Pass B (drawn ON TOP of both panels)
|
||||
```
|
||||
|
||||
In this layer, panelB does *not* see card_red — even though card_red was submitted before panelB —
|
||||
@@ -674,11 +687,11 @@ card_red:
|
||||
base := draw.begin(...)
|
||||
draw.rectangle(base, bg, GRAY)
|
||||
draw.rectangle(base, card_blue, BLUE)
|
||||
draw.rectangle_backdrop(base, panelA, 12) // panelA in base layer's bracket
|
||||
draw.gaussian_blur(base, panelA, sigma=12) // panelA in base layer's bracket
|
||||
|
||||
top := draw.new_layer(base, ...)
|
||||
draw.rectangle(top, card_red, RED)
|
||||
draw.rectangle_backdrop(top, panelB, 12) // top layer's bracket; sees base + card_red
|
||||
draw.gaussian_blur(top, panelB, sigma=12) // top layer's bracket; sees base + card_red
|
||||
draw.text(top, "label", ...)
|
||||
```
|
||||
|
||||
@@ -708,29 +721,30 @@ draws, `position` carries actual world-space geometry. For SDF draws, `position`
|
||||
corners (0,0 to 1,1) and the vertex shader computes world-space position from the storage-buffer
|
||||
primitive's bounds.
|
||||
|
||||
The `Primitive` struct for SDF shapes lives in the storage buffer, not in vertex attributes:
|
||||
The `Base_2D_Primitive` struct for SDF shapes lives in the storage buffer, not in vertex attributes:
|
||||
|
||||
```
|
||||
Primitive :: struct {
|
||||
bounds: [4]f32, // 0: min_x, min_y, max_x, max_y
|
||||
color: Color, // 16: u8x4, unpacked in shader via unpackUnorm4x8
|
||||
flags: u32, // 20: low byte = Shape_Kind, bits 8+ = Shape_Flags
|
||||
rotation_sc: u32, // 24: packed f16 pair (sin, cos). Requires .Rotated flag.
|
||||
_pad: f32, // 28: reserved for future use
|
||||
params: Shape_Params, // 32: per-kind params union (half_feather, radii, etc.) (32 bytes)
|
||||
uv: Uv_Or_Effects, // 64: texture UV rect or gradient/outline parameters (16 bytes)
|
||||
Base_2D_Primitive :: struct {
|
||||
bounds: [4]f32, // 0: min_x, min_y, max_x, max_y
|
||||
color: Color, // 16: u8x4, unpacked in shader via unpackUnorm4x8
|
||||
flags: u32, // 20: low byte = Shape_Kind, bits 8+ = Shape_Flags
|
||||
rotation_sc: u32, // 24: packed f16 pair (sin, cos). Requires .Rotated flag.
|
||||
_pad: f32, // 28: reserved for future use
|
||||
params: Shape_Params, // 32: per-kind params union (half_feather, radii, etc.) (32 bytes)
|
||||
uv_rect: [4]f32, // 64: texture UV coordinates. Read when .Textured.
|
||||
effects: Gradient_Outline, // 80: gradient and/or outline parameters (16 bytes).
|
||||
}
|
||||
// Total: 80 bytes (std430 aligned)
|
||||
// Total: 96 bytes (std430 aligned)
|
||||
```
|
||||
|
||||
`Shape_Params` is a `#raw_union` over `RRect_Params`, `NGon_Params`, `Ellipse_Params`, and
|
||||
`Ring_Arc_Params` (plus a `raw: [8]f32` view), defined in `pipeline_2d_base.odin`. Each SDF kind
|
||||
writes its own params variant; the fragment shader reads the appropriate fields based on `Shape_Kind`.
|
||||
`Uv_Or_Effects` is a `#raw_union` that aliases `[4]f32` (texture UV rect: u_min, v_min, u_max,
|
||||
v_max) with a `Gradient_Outline` struct containing `gradient_color: Color`, `outline_color: Color`,
|
||||
`Gradient_Outline` is a 16-byte struct containing `gradient_color: Color`, `outline_color: Color`,
|
||||
`gradient_dir_sc: u32` (packed f16 cos/sin pair), and `outline_packed: u32` (packed f16 outline
|
||||
width). The `flags` field encodes the `Shape_Kind` in the low byte and `Shape_Flags` in bits 8+
|
||||
via `pack_kind_flags`.
|
||||
width). It is independent of `uv_rect`, so a primitive can carry texture and outline parameters at
|
||||
the same time. The `flags` field encodes the `Shape_Kind` in the low byte and `Shape_Flags` in bits
|
||||
8+ via `pack_kind_flags`.
|
||||
|
||||
### Draw submission order
|
||||
|
||||
@@ -754,7 +768,7 @@ pair into bitmap atlases and emits indexed triangle data via `GetGPUTextDrawData
|
||||
**unchanged** by the SDF migration — text continues to flow through the main pipeline's tessellated
|
||||
mode with `mode = 0`, sampling the SDL_ttf atlas texture.
|
||||
|
||||
A future phase may evaluate MSDF (multi-channel signed distance field) text rendering, which would
|
||||
MSDF (multi-channel signed distance field) text rendering may be evaluated later, which would
|
||||
allow resolution-independent glyph rendering from a single small atlas per font. This would involve:
|
||||
|
||||
- Offline atlas generation via Chlumský's msdf-atlas-gen tool.
|
||||
@@ -763,8 +777,7 @@ allow resolution-independent glyph rendering from a single small atlas per font.
|
||||
already exists for the four current SDF kinds).
|
||||
- Potential removal of the SDL_ttf dependency.
|
||||
|
||||
This is explicitly deferred. The SDF shape migration is independent of and does not block text
|
||||
changes.
|
||||
This is explicitly deferred.
|
||||
|
||||
**References:**
|
||||
|
||||
@@ -778,8 +791,8 @@ changes.
|
||||
### Textures
|
||||
|
||||
Textures plug into the existing main pipeline — no additional GPU pipeline, no shader rewrite. The
|
||||
work is a resource layer (registration, upload, sampling, lifecycle) plus two textured-draw procs
|
||||
that route into the existing tessellated and SDF paths respectively.
|
||||
work is a resource layer (registration, upload, sampling, lifecycle) plus a `Texture_Fill` Brush
|
||||
variant that routes the existing shape procs through the SDF path with the `.Textured` flag set.
|
||||
|
||||
#### Why draw owns registered textures
|
||||
|
||||
@@ -829,22 +842,25 @@ with the same texture but different samplers produce separate draw calls, which
|
||||
|
||||
#### Textured draw procs
|
||||
|
||||
Textured rectangles route through the existing SDF path via `rectangle_texture`, which mirrors
|
||||
`rectangle` exactly — same parameters for radii, origin, rotation, feather — with the `color`
|
||||
parameter replaced by a `Texture_Id`, an optional `tint`, a `uv_rect`, and a `Sampler_Preset`.
|
||||
Textures share the same shape procs as colors and gradients. Each shape proc takes a `Brush`
|
||||
union as its fill source; passing a `Texture_Fill` value (carrying `Texture_Id`, `tint`,
|
||||
`uv_rect`, and `Sampler_Preset`) routes the draw through the SDF path with the `.Textured`
|
||||
flag set. There is no dedicated `rectangle_texture` / `circle_texture` proc — the same
|
||||
`rectangle`, `circle`, `ellipse`, `polygon`, `ring`, `line`, and `line_strip` procs handle
|
||||
all fill sources.
|
||||
|
||||
An earlier iteration of this design considered a separate tessellated proc for "simple" fullscreen
|
||||
quads, on the theory that the tessellated path's lower register count would improve occupancy at
|
||||
large fragment counts. Both paths are well within the ≤24-register main pipeline budget — both run at
|
||||
full occupancy on every target architecture (Valhall and above). The remaining ALU difference (~15
|
||||
extra instructions for the SDF evaluation) amounts to ~20μs at 4K — below noise. Meanwhile,
|
||||
splitting into a separate pipeline would add ~1–5μs per pipeline bind on the CPU side per scissor,
|
||||
matching or exceeding the GPU-side savings. Within the main pipeline, unified remains strictly better.
|
||||
A separate tessellated proc for "simple" fullscreen quads was considered on the theory that
|
||||
the tessellated path's lower register count would improve occupancy at large fragment counts.
|
||||
Both paths are well within the ≤24-register main pipeline budget — both run at full
|
||||
occupancy on every target architecture (Valhall and above). The remaining ALU difference
|
||||
(~15 extra instructions for the SDF evaluation) amounts to ~20μs at 4K — below noise.
|
||||
Meanwhile, splitting into a separate pipeline would add ~1–5μs per pipeline bind on the CPU
|
||||
side per scissor, matching or exceeding the GPU-side savings. Within the main pipeline,
|
||||
unified remains strictly better.
|
||||
|
||||
SDF drawing procs live in the `draw` package with unprefixed names (`rectangle`, `rectangle_texture`,
|
||||
`circle`, `ellipse`, `polygon`, `ring`, `line`, `line_strip`). Gradients and outlines are optional
|
||||
parameters on each proc rather than separate overloads. Future per-shape texture variants
|
||||
(`circle_texture`, `ellipse_texture`) are additive.
|
||||
SDF drawing procs live in the `draw` package with unprefixed names (`rectangle`, `circle`,
|
||||
`ellipse`, `polygon`, `ring`, `line`, `line_strip`). Gradients, textures, and outlines are
|
||||
selected via the `Brush` union and optional outline parameters rather than separate overloads.
|
||||
|
||||
#### What SDF anti-aliasing does and does not do for textured draws
|
||||
|
||||
@@ -858,8 +874,8 @@ depends on how closely the display size matches the SDL_ttf atlas's rasterized s
|
||||
#### Fit modes are a computation layer, not a renderer concept
|
||||
|
||||
Standard image-fit behaviors (stretch, fill/cover, fit/contain, tile, center) are expressed as UV
|
||||
sub-region computations on top of the `uv_rect` parameter that both textured-draw procs accept. The
|
||||
renderer has no knowledge of fit modes — it samples whatever UV region it is given.
|
||||
sub-region computations on top of the `uv_rect` field of `Texture_Fill`. The renderer has no
|
||||
knowledge of fit modes — it samples whatever UV region it is given.
|
||||
|
||||
A `fit_params` helper computes the appropriate `uv_rect`, sampler preset, and (for letterbox/fit
|
||||
mode) shrunken inner rect from a `Fit_Mode` enum, the target rect, and the texture's pixel size.
|
||||
@@ -883,13 +899,13 @@ textures onto a free list that is processed in `r_end_frame`, not at the call si
|
||||
|
||||
Clay's `RenderCommandType.Image` is handled by dereferencing `imageData: rawptr` as a pointer to a
|
||||
`Clay_Image_Data` struct containing a `Texture_Id`, `Fit_Mode`, and tint color. Routing mirrors the
|
||||
existing rectangle handling: `fit_params` computes UVs from the fit mode, then
|
||||
`rectangle_texture` is called with the appropriate radii (zero for sharp corners, per-corner values
|
||||
from Clay's `cornerRadius` otherwise).
|
||||
existing rectangle handling: `fit_params` computes UVs from the fit mode, then `rectangle` is
|
||||
called with a `Texture_Fill` brush and the appropriate radii (zero for sharp corners, per-corner
|
||||
values from Clay's `cornerRadius` otherwise).
|
||||
|
||||
#### Deferred features
|
||||
|
||||
The following are plumbed in the descriptor but not implemented in phase 1:
|
||||
The following are plumbed in `Texture_Desc` but not yet implemented:
|
||||
|
||||
- **Mipmaps**: `Texture_Desc.mip_levels` field exists; generation via SDL3 deferred.
|
||||
- **Compressed formats**: `Texture_Desc.format` accepts BC/ASTC; upload path deferred.
|
||||
@@ -897,7 +913,6 @@ The following are plumbed in the descriptor but not implemented in phase 1:
|
||||
- **3D textures, arrays, cube maps**: `Texture_Desc.type` and `depth_or_layers` fields exist.
|
||||
- **Additional samplers**: anisotropic, trilinear, clamp-to-border — additive enum values.
|
||||
- **Atlas packing**: internal optimization for sub-batch coalescing; invisible to callers.
|
||||
- **Per-shape texture variants**: `circle_texture`, `ellipse_texture`, `polygon_texture` — potential future additions, following the existing naming convention.
|
||||
|
||||
**References:**
|
||||
|
||||
|
||||
+33
-32
@@ -21,16 +21,16 @@ import sdl "vendor:sdl3"
|
||||
// sigma_phys ≤ 8 → factor = 2
|
||||
// sigma_phys > 8 → factor = 4 (capped)
|
||||
//
|
||||
// Capped at factor=4: master's preference for visual quality over bandwidth at the high end.
|
||||
// Larger factors (8 and 16) would lose more high-frequency detail than the kernel can mask
|
||||
// even with the H+V split, and the bandwidth saving is small (the work region also shrinks
|
||||
// quadratically, so most of the savings are already captured at factor=4).
|
||||
// Capped at factor=4 to favor visual quality over bandwidth at the high end. Larger factors
|
||||
// (8 and 16) would lose more high-frequency detail than the kernel can mask even with the
|
||||
// H+V split, and the bandwidth saving is small (the work region also shrinks quadratically,
|
||||
// so most of the savings are already captured at factor=4).
|
||||
//
|
||||
// Working textures are sized at full swapchain resolution to support factor=1. Larger factors
|
||||
// just write to a smaller sub-rect via viewport-limited rendering. Memory cost: ½-res → full-
|
||||
// res working textures means 4× more bytes per working texture (2 textures, RGBA8: roughly
|
||||
// 16 MB at 1080p, 64 MB at 4K). On modern GPUs this is well within budget; on Mali Valhall
|
||||
// SBCs it's negligible against unified-memory headroom.
|
||||
// just write to a smaller sub-rect via viewport-limited rendering. Memory cost: full-res
|
||||
// working textures (2 textures, RGBA8) is roughly 16 MB at 1080p, 64 MB at 4K. On modern
|
||||
// GPUs this is well within budget; on Mali Valhall SBCs it's negligible against unified-
|
||||
// memory headroom.
|
||||
//
|
||||
// The shaders read the factor as a uniform. The downsample shader has three paths (factor=1
|
||||
// identity, factor=2 single bilinear tap, factor>=4 four bilinear taps with offsets scaling
|
||||
@@ -86,7 +86,7 @@ Backdrop_Vert_Uniforms :: struct {
|
||||
// shaders/source/backdrop_downsample.frag.
|
||||
Backdrop_Downsample_Frag_Uniforms :: struct {
|
||||
inv_source_size: [2]f32, // 0: 8 — 1.0 / source_texture pixel dimensions (full-res)
|
||||
downsample_factor: u32, // 8: 4 — 2 or 4 (selects 1-tap vs 4-tap path in shader)
|
||||
downsample_factor: u32, // 8: 4 — 1, 2, or 4 (selects identity / 1-tap / 4-tap path in shader)
|
||||
_pad0: u32, // 12: 4
|
||||
}
|
||||
|
||||
@@ -120,11 +120,12 @@ Pipeline_2D_Backdrop :: struct {
|
||||
primitive_buffer: Buffer,
|
||||
|
||||
// Working textures, allocated once at swapchain resolution and recreated only on resize.
|
||||
// `source_texture` is full-resolution; the other two are ¼-res. All single-sample.
|
||||
// All three are sized at full swapchain resolution and single-sample. Larger downsample
|
||||
// factors fill only a sub-rect via viewport-limited rendering (see file-header comment).
|
||||
// source_texture — when any backdrop draw exists this frame, the entire frame renders
|
||||
// here instead of the swapchain (Approach B). Copied to the swapchain
|
||||
// at frame end. Acts as the bracket's snapshot input by virtue of
|
||||
// already containing the pre-bracket frame.
|
||||
// here instead of the swapchain. Copied to the swapchain at frame
|
||||
// end. Acts as the bracket's snapshot input by virtue of already
|
||||
// containing the pre-bracket frame.
|
||||
// downsample_texture — written by the downsample PSO. Read by the blur PSO in mode 0.
|
||||
// h_blur_texture — written by the blur PSO in mode 0. Read by the blur PSO in mode 1.
|
||||
source_texture: ^sdl.GPUTexture,
|
||||
@@ -243,7 +244,7 @@ create_pipeline_2d_backdrop :: proc(
|
||||
|
||||
//----- Downsample PSO ----------------------------------
|
||||
// Single bilinear sample, blend disabled. No vertex buffer (gl_VertexIndex 0..2 emits the
|
||||
// fullscreen triangle). Single-sample target (the ¼-res working textures are never MSAA).
|
||||
// fullscreen triangle). Single-sample target (working textures are never MSAA).
|
||||
downsample_target := sdl.GPUColorTargetDescription {
|
||||
format = swapchain_format,
|
||||
blend_state = sdl.GPUColorTargetBlendState{enable_blend = false},
|
||||
@@ -350,9 +351,9 @@ destroy_pipeline_2d_backdrop :: proc(device: ^sdl.GPUDevice, pipeline: ^Pipeline
|
||||
// ---------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
// Allocate (or reallocate, on resize) the three working textures that the backdrop bracket
|
||||
// uses. `source_texture` is full swapchain resolution; the other two are ¼-res. All single-
|
||||
// sample, all share the swapchain format, all need {.COLOR_TARGET, .SAMPLER} usage so they
|
||||
// can be written by render passes and read by subsequent passes.
|
||||
// uses. All three are sized at full swapchain resolution, single-sample, share the swapchain
|
||||
// format, and need {.COLOR_TARGET, .SAMPLER} usage so they can be written by render passes
|
||||
// and read by subsequent passes.
|
||||
//
|
||||
// Recreates on dimension change only — same-size frames hit the early-out and skip GPU
|
||||
// resource churn.
|
||||
@@ -466,19 +467,19 @@ ensure_backdrop_textures :: proc(device: ^sdl.GPUDevice, format: sdl.GPUTextureF
|
||||
// `i in [1, pair_count)` and does two texture fetches per pair — one at +offset, one at
|
||||
// -offset — for a total of 1 + 2*(pair_count-1) bilinear fetches per fragment.
|
||||
//
|
||||
// `sigma` is the true Gaussian standard deviation in the kernel's working-space units (¼-res
|
||||
// texels, after the caller has converted from logical pixels via dpi_scaling and the
|
||||
// downsample factor). The kernel extent reaches ±3σ, capturing 99.7% of the Gaussian's
|
||||
// `sigma` is the true Gaussian standard deviation in the kernel's working-space units
|
||||
// (working-resolution texels, after the caller has converted from logical pixels via
|
||||
// dpi_scaling and the downsample factor). The kernel extent reaches ±3σ, capturing 99.7% of
|
||||
// the Gaussian's
|
||||
// mass; weights beyond that contribute imperceptibly. sigma <= 0 produces a degenerate
|
||||
// kernel `{1, 0}` that acts as a sharp pass-through. After the loop, the discrete weights
|
||||
// are normalized so they sum to 1.0 (truncating at ±3σ loses a tiny amount of mass; we
|
||||
// renormalize to preserve overall image brightness).
|
||||
//
|
||||
// Earlier versions of this routine ported RAD Debugger's algorithm verbatim, which derives
|
||||
// stdev from a tap-count parameter (`stdev = (blur_count-1)/2`). That made the parameter
|
||||
// name misleading: the user thought they were passing σ but were actually passing
|
||||
// half-kernel-width. This version takes σ directly and derives the tap count from it,
|
||||
// matching what callers expect when they read "gaussian_sigma".
|
||||
// Note on the parameter contract: this routine takes σ directly and derives the tap count
|
||||
// from it, rather than the inverse (RAD Debugger's algorithm passes a tap count and derives
|
||||
// `stdev = (blur_count-1)/2`). Taking σ directly matches what callers expect when they read
|
||||
// "gaussian_sigma" — passing tap count under that name was a footgun.
|
||||
@(private)
|
||||
compute_blur_kernel :: proc(sigma: f32, kernel: ^[MAX_BACKDROP_KERNEL_PAIRS][4]f32) -> (pair_count: u32) {
|
||||
if sigma <= 0 {
|
||||
@@ -624,7 +625,7 @@ upload_backdrop_primitives :: proc(device: ^sdl.GPUDevice, pass: ^sdl.GPUCopyPas
|
||||
// ---------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
// Returns true if any sub-batch in any layer this frame is .Backdrop kind. Called once at the
|
||||
// top of `end()` to decide whether to route the whole frame to source_texture (Approach B).
|
||||
// top of `end()` to decide whether to route the whole frame to source_texture.
|
||||
// O(total sub-batches) but with an early-exit on the first hit, so typical cost is tiny.
|
||||
@(private)
|
||||
frame_has_backdrop :: proc() -> bool {
|
||||
@@ -742,10 +743,10 @@ compute_backdrop_group_work_region :: proc(
|
||||
// target viewport, per-primitive SDF discard handles masking and applies the tint. Each
|
||||
// sub-batch in the group is one instanced draw.
|
||||
//
|
||||
// V-blur was historically combined with the composite into a single shader invocation, but
|
||||
// that produced a horizontal-vs-vertical asymmetry artifact (horizontal source features
|
||||
// looked sharper than vertical ones inside the panel). Splitting V-blur into its own
|
||||
// working→working pass restores symmetry by making H and V blurs structurally identical.
|
||||
// V-blur is run as its own working→working pass rather than folded into the composite. The
|
||||
// folded variant produces a horizontal-vs-vertical asymmetry artifact (horizontal source
|
||||
// features end up looking sharper than vertical ones inside the panel). Matching V's
|
||||
// structure exactly to H's restores symmetry.
|
||||
//
|
||||
// On exit, source_texture contains the pre-bracket contents plus all backdrop primitives
|
||||
// composited on top. The caller then runs Pass B (post-bracket non-backdrop sub-batches) on
|
||||
@@ -1011,8 +1012,8 @@ run_backdrop_bracket :: proc(
|
||||
// geometry. The caller sets `color` (tint) on the returned primitive before submitting.
|
||||
//
|
||||
// No rotation, no outline — backdrop primitives are intentionally limited to axis-aligned
|
||||
// RRects in v1. Rotation breaks screen-space blur sampling visually; outline would be a
|
||||
// specialized edge effect that belongs in its own primitive type.
|
||||
// RRects. Rotation breaks screen-space blur sampling visually; outline would be a specialized
|
||||
// edge effect that belongs in its own primitive type.
|
||||
@(private)
|
||||
build_backdrop_primitive :: proc(
|
||||
rect: Rectangle,
|
||||
|
||||
+5
-5
@@ -830,9 +830,9 @@ end :: proc(device: ^sdl.GPUDevice, window: ^sdl.Window, clear_color: Color = DF
|
||||
}
|
||||
|
||||
// Pre-scan: if any layer this frame has a backdrop sub-batch, route the entire frame to
|
||||
// source_texture (Approach B) so the bracket can sample the pre-bracket framebuffer
|
||||
// without a mid-frame texture copy. Frames without any backdrop hit the existing fast
|
||||
// path and never touch the backdrop pipeline's working textures.
|
||||
// source_texture so the bracket can sample the pre-bracket framebuffer without a mid-
|
||||
// frame texture copy. Frames without any backdrop hit the existing fast path and never
|
||||
// touch the backdrop pipeline's working textures.
|
||||
has_backdrop := frame_has_backdrop()
|
||||
|
||||
// Upload primitives to GPU (vertices, indices, SDF prims, and backdrop prims share one
|
||||
@@ -880,8 +880,8 @@ end :: proc(device: ^sdl.GPUDevice, window: ^sdl.Window, clear_color: Color = DF
|
||||
draw_layer(device, window, cmd_buffer, render_texture, width, height, clear_color_f32, &layer)
|
||||
}
|
||||
|
||||
// Approach B finalization: when we rendered into source_texture, copy it to the swapchain.
|
||||
// Single CopyGPUTextureToTexture call per frame, only when backdrop content was present.
|
||||
// When we rendered into source_texture, copy it to the swapchain. Single
|
||||
// CopyGPUTextureToTexture call per frame, only when backdrop content was present.
|
||||
if has_backdrop {
|
||||
copy_pass := sdl.BeginGPUCopyPass(cmd_buffer)
|
||||
sdl.CopyGPUTextureToTexture(
|
||||
|
||||
@@ -20,7 +20,7 @@ texture_size :: #force_inline proc(qrcode_buf: []u8) -> int {
|
||||
//
|
||||
// Returns ok=false when:
|
||||
// - qrcode_buf is invalid (qrcode.get_size returns 0).
|
||||
// - texture_buf is smaller than to_texture_size(qrcode_buf).
|
||||
// - texture_buf is smaller than texture_size(qrcode_buf).
|
||||
@(require_results)
|
||||
to_texture :: proc(
|
||||
qrcode_buf: []u8,
|
||||
|
||||
@@ -10,8 +10,8 @@ import cyber "../cybersteel"
|
||||
|
||||
// Backdrop example.
|
||||
//
|
||||
// Verifies the Stage D bracket scheduler end-to-end. The demo is structured as three zones in
|
||||
// one window so we can stress-test the cases that matter:
|
||||
// Exercises the bracket scheduler end-to-end. The demo is structured as three zones in one
|
||||
// window so we can stress-test the cases that matter:
|
||||
//
|
||||
// Zone 1 (top, base layer): animated colorful background + two side-by-side frosted panels
|
||||
// with DIFFERENT sigmas and DIFFERENT tints. Tests sigma grouping
|
||||
@@ -269,9 +269,8 @@ gaussian_blur :: proc() {
|
||||
// SPACE : reset to sigma=10
|
||||
// T : toggle the test rectangle on top of the panel
|
||||
//
|
||||
// Sigma is printed to the console label and to the title bar so you can correlate visual
|
||||
// behavior with kernel state (which is also logged via the [backdrop] debug print in
|
||||
// backdrop.odin's compute_blur_kernel callsite).
|
||||
// Sigma is printed to the title bar so you can correlate visual behavior with the numeric
|
||||
// value as you adjust it.
|
||||
gaussian_blur_debug :: proc() {
|
||||
if !sdl.Init({.VIDEO}) do os.exit(1)
|
||||
window := sdl.CreateWindow("Backdrop debug", 800, 600, {.HIGH_PIXEL_DENSITY})
|
||||
|
||||
@@ -116,9 +116,9 @@ Gradient_Outline :: struct {
|
||||
// avoiding per-pixel trigonometry in the fragment shader. Only read when .Rotated is set.
|
||||
//
|
||||
// Named Base_2D_Primitive (not just Primitive) to disambiguate from Backdrop_Primitive in
|
||||
// pipeline_2d_backdrop.odin. The two pipelines have unrelated GPU layouts and unrelated
|
||||
// fragment-shader contracts; pairing each with its own primitive type keeps cross-references
|
||||
// unambiguous when grepping the codebase.
|
||||
// backdrop.odin. The two pipelines have unrelated GPU layouts and unrelated fragment-shader
|
||||
// contracts; pairing each with its own primitive type keeps cross-references unambiguous
|
||||
// when grepping the codebase.
|
||||
Base_2D_Primitive :: struct {
|
||||
bounds: [4]f32, // 0: min_x, min_y, max_x, max_y (world-space, pre-DPI)
|
||||
color: Color, // 16: u8x4, fill color / gradient start color / texture tint
|
||||
|
||||
@@ -1,19 +1,18 @@
|
||||
#version 450 core
|
||||
|
||||
// Unified backdrop blur fragment shader.
|
||||
// Handles both H-blur (mode 0, blurs the ¼-resolution downsample texture into
|
||||
// the ¼-resolution h_blur texture) and V-blur+composite (mode 1, blurs h_blur
|
||||
// vertically, masks via RRect SDF, applies tint, composites outline, and writes
|
||||
// to the main render target with premultiplied alpha).
|
||||
// Handles both the 1D separable blur passes (mode 0, used for BOTH the H-pass and V-pass;
|
||||
// `direction` picks the axis) and the composite pass (mode 1, reads the fully-blurred
|
||||
// working texture, masks via RRect SDF, applies tint, and writes to source_texture with
|
||||
// premultiplied-over blending). Working textures are sized at the full swapchain resolution;
|
||||
// downsampled content occupies only a sub-rect at downsample factor > 1 (set via viewport).
|
||||
//
|
||||
// Following RAD's pattern, V-mode replaces a separate composite pass: the SDF
|
||||
// discard limits V-blur work to the masked region, and the per-primitive tint
|
||||
// is folded in. Output blends with the main render target via the standard
|
||||
// premultiplied-over blend state (ONE, ONE_MINUS_SRC_ALPHA).
|
||||
// The composite blends with source_texture via the standard premultiplied-over blend state
|
||||
// (ONE, ONE_MINUS_SRC_ALPHA).
|
||||
//
|
||||
// Backdrop primitives are tint-only — there is no outline. A specialized edge
|
||||
// effect (e.g. liquid-glass-style refraction outlines) would be implemented
|
||||
// as a dedicated primitive type with its own pipeline.
|
||||
// Backdrop primitives are tint-only — there is no outline. A specialized edge effect
|
||||
// (e.g. liquid-glass-style refraction outlines) would be implemented as a dedicated
|
||||
// primitive type with its own pipeline.
|
||||
//
|
||||
// Two modes, structurally distinct:
|
||||
//
|
||||
@@ -30,11 +29,11 @@
|
||||
// (gl_FragCoord.xy * inv_downsample_factor) * inv_working_size.
|
||||
// No kernel is applied here — the blur is already complete.
|
||||
//
|
||||
// Splitting V-blur out of the composite pass (an earlier version combined them) was needed
|
||||
// to avoid a horizontal-vs-vertical asymmetry artifact: when the V-blur sampled the H-blur
|
||||
// output through the bilinear-upsample/SDF-mask/tint pipeline in one shader invocation,
|
||||
// horizontal source features ended up looking sharper than vertical ones. Running V-blur as
|
||||
// its own working→working pass (matching H's structure exactly) restores symmetry.
|
||||
// V-blur is run as its own working→working pass rather than folded into the composite. The
|
||||
// folded variant produced a horizontal-vs-vertical asymmetry artifact: when V-blur sampled
|
||||
// the H-blur output through the bilinear-upsample/SDF-mask/tint pipeline in one shader
|
||||
// invocation, horizontal source features ended up looking sharper than vertical ones.
|
||||
// Matching V's structure exactly to H's restores symmetry.
|
||||
|
||||
const uint MAX_KERNEL_PAIRS = 32;
|
||||
|
||||
@@ -140,16 +139,16 @@ void main() {
|
||||
vec2 uv = (gl_FragCoord.xy * inv_downsample_factor) * inv_working_size;
|
||||
vec3 color = texture(blur_input_tex, uv).rgb;
|
||||
|
||||
// Tint composition (Option B semantics): inside the masked region the panel is fully
|
||||
// opaque — it completely hides the original framebuffer content, just like real frosted
|
||||
// glass and like iOS UIBlurEffect / CSS backdrop-filter. f_color.rgb specifies the tint
|
||||
// color; f_color.a specifies the tint *mix strength* (NOT panel opacity). At alpha=0 we
|
||||
// see the pure blur; at alpha=255 we see the blur fully multiplied by the tint color.
|
||||
// Tint composition: inside the masked region the panel is fully opaque — it completely
|
||||
// hides the original framebuffer content, just like real frosted glass and like iOS
|
||||
// UIBlurEffect / CSS backdrop-filter. f_color.rgb specifies the tint color; f_color.a
|
||||
// specifies the tint *mix strength* (NOT panel opacity). At alpha=0 we see the pure
|
||||
// blur; at alpha=255 we see the blur fully multiplied by the tint color.
|
||||
//
|
||||
// Output is premultiplied to match the ONE, ONE_MINUS_SRC_ALPHA blend state. Coverage
|
||||
// (the SDF mask's edge AA) modulates only the alpha channel, never the panel-vs-source
|
||||
// blend; that way edge pixels still feather correctly without re-introducing the bug
|
||||
// where mid-panel pixels became semi-transparent.
|
||||
// blend; that way edge pixels still feather correctly while mid-panel pixels stay fully
|
||||
// opaque.
|
||||
mediump vec3 tinted = mix(color, color * f_color.rgb, f_color.a);
|
||||
mediump float coverage = sdf_alpha(d_n, h_n);
|
||||
out_color = vec4(tinted * coverage, coverage);
|
||||
|
||||
@@ -1,18 +1,19 @@
|
||||
#version 450 core
|
||||
|
||||
// Unified backdrop blur vertex shader.
|
||||
// Handles both H-blur (fullscreen triangle, mode 0) and V-blur+composite (instanced
|
||||
// unit-quad over Backdrop_Primitive storage buffer, mode 1) for the second PSO of
|
||||
// the backdrop bracket. The first PSO (downsample) uses backdrop_fullscreen.vert.
|
||||
// Handles both the 1D separable blur passes (fullscreen triangle, mode 0; used for
|
||||
// BOTH the H-pass and V-pass) and the composite pass (instanced unit-quad over
|
||||
// Backdrop_Primitive storage buffer, mode 1) for the second PSO of the backdrop bracket.
|
||||
// The first PSO (downsample) uses backdrop_fullscreen.vert.
|
||||
//
|
||||
// No vertex buffer for either mode. Mode 0 uses gl_VertexIndex 0..2 for a single
|
||||
// fullscreen triangle; mode 1 uses gl_VertexIndex 0..5 for a unit-quad (two
|
||||
// triangles, TRIANGLELIST topology) and gl_InstanceIndex to select the primitive.
|
||||
//
|
||||
// Mode 0 viewport+scissor are CPU-set per layer-bracket to the work region (union
|
||||
// AABB of backdrop primitives + 3*max_sigma, clamped to swapchain bounds). Mode 1
|
||||
// renders into the main render target with the screen-space orthographic projection;
|
||||
// the per-primitive bounds drive the quad in screen space.
|
||||
// Mode 0 viewport+scissor are CPU-set per sigma group to the work region (union AABB
|
||||
// of that group's backdrop primitives + halo, clamped to swapchain bounds). Mode 1
|
||||
// renders into source_texture with the screen-space orthographic projection; the
|
||||
// per-primitive bounds drive the quad in screen space.
|
||||
//
|
||||
// Backdrop primitives have NO rotation — backdrop sampling is in screen space, so
|
||||
// a rotated mask over a stationary blur sample would look wrong.
|
||||
@@ -46,11 +47,11 @@ layout(set = 1, binding = 0) uniform Uniforms {
|
||||
// vec2 and scalar tail packs tight to land the struct at a clean 48-byte
|
||||
// stride (a multiple of 16, so the array stride needs no rounding either).
|
||||
// Field semantics match the CPU-side Backdrop_Primitive declared in
|
||||
// levlib/draw/pipeline_2d_backdrop.odin; keep both in sync.
|
||||
// levlib/draw/backdrop.odin; keep both in sync.
|
||||
//
|
||||
// Backdrop primitives are tint-only in v1: outline is intentionally absent.
|
||||
// Future specialized effects (e.g. liquid-glass-style edges) would be a
|
||||
// dedicated primitive type with its own pipeline rather than a flag bit here.
|
||||
// Backdrop primitives are tint-only: outline is intentionally absent. Specialized
|
||||
// edge effects (e.g. liquid-glass-style refraction outlines) would be a dedicated
|
||||
// primitive type with its own pipeline rather than a flag bit here.
|
||||
struct Backdrop_Primitive {
|
||||
vec4 bounds; // 0-15: min_xy, max_xy (world-space)
|
||||
vec4 radii; // 16-31: per-corner radii (physical px)
|
||||
|
||||
@@ -2,9 +2,9 @@
|
||||
|
||||
// Backdrop downsample fragment shader.
|
||||
// Reads source_texture (full-resolution snapshot of pre-bracket framebuffer contents) and
|
||||
// writes a downsampled copy at factor 1, 2, 4, 8, or 16. The output is the working texture
|
||||
// (sized at full swapchain resolution); larger factors only fill a sub-rect of it via the
|
||||
// CPU-set viewport. See backdrop.odin for the factor selection table (Flutter-style).
|
||||
// writes a downsampled copy at factor 1, 2, or 4. The output is the working texture (sized
|
||||
// at full swapchain resolution); larger factors only fill a sub-rect of it via the CPU-set
|
||||
// viewport. See backdrop.odin for the factor selection table (Flutter-style).
|
||||
//
|
||||
// Shader paths by factor:
|
||||
//
|
||||
@@ -15,15 +15,12 @@
|
||||
// factor=2: each output covers a 2×2 source block. Single bilinear tap at the shared
|
||||
// corner reads all 4 source pixels with 0.25 weight.
|
||||
//
|
||||
// factor>=4: each output covers a (factor)×(factor) source block. We use 4 bilinear taps,
|
||||
// each at the shared corner of a (factor/2)×(factor/2) sub-block. Each tap reads
|
||||
// 4 source pixels uniformly; combined, the 4 taps sample 16 source pixels arranged
|
||||
// uniformly across the block. This is an approximation of a true (factor)² box
|
||||
// filter — exact at factor=4 (16 pixels = full coverage), undersampled at factor=8
|
||||
// (16 pixels of 64) and factor=16 (16 of 256). Flutter uses a richer 13-tap COD-
|
||||
// style downsample shader at high factors; we accept the simpler 4-tap pattern
|
||||
// for now since the high-factor cases come with large kernels that mask any
|
||||
// residual aliasing.
|
||||
// factor=4: each output covers a 4×4 source block. We use 4 bilinear taps, each at the
|
||||
// shared corner of a 2×2 sub-block. Each tap reads 4 source pixels uniformly;
|
||||
// combined, the 4 taps sample 16 source pixels arranged uniformly across the
|
||||
// block (full coverage at factor=4). The factor>=4 path is structured so the
|
||||
// same shader code would extend to factor=8 (16 pixels of 64) or factor=16 (16
|
||||
// of 256) if the CPU-side cap is ever raised, though the current cap is 4.
|
||||
//
|
||||
// The viewport+scissor are set by the CPU to limit output to the layer's work region in
|
||||
// working-texture coords (work_region_phys / factor), clamped to the texture bounds.
|
||||
|
||||
@@ -45,7 +45,7 @@ layout(std430, set = 0, binding = 0) readonly buffer Base_2D_Primitives {
|
||||
// ---------- Entry point ----------
|
||||
void main() {
|
||||
if (mode == 0u) {
|
||||
// ---- Mode 0: Tessellated (legacy) ----
|
||||
// ---- Mode 0: Tessellated (used for text and arbitrary user geometry) ----
|
||||
f_color = v_color;
|
||||
f_local_or_uv = v_uv;
|
||||
f_params = vec4(0.0);
|
||||
|
||||
+2
-1
@@ -53,7 +53,8 @@ emit_rectangle :: proc(x, y, width, height: f32, color: Color, vertices: []Verte
|
||||
}
|
||||
|
||||
// Internal — submit an SDF primitive with optional texture binding.
|
||||
// Replaces the old prepare_sdf_primitive and prepare_sdf_primitive_textured.
|
||||
// The texture-aware counterpart of `draw.prepare_sdf_primitive`; lets shape procs route a
|
||||
// texture_id and sampler into the sub-batch without growing the public API.
|
||||
@(private)
|
||||
prepare_sdf_primitive_ex :: proc(
|
||||
layer: ^Layer,
|
||||
|
||||
Reference in New Issue
Block a user