package draw import "core:log" import "core:math" import "core:mem" import sdl "vendor:sdl3" // Adaptive downsample design (Flutter-style). // // The bracket picks a downsample factor per-sigma-group, not as a global constant. The choice // is driven by Flutter's `CalculateScale` formula in // impeller/entity/contents/filters/gaussian_blur_filter_contents.cc (originally from Skia's // GrBlurUtils): downsample so that the sigma in working-resolution pixels stays in the // 2..4 range. This keeps the kernel reach wide enough to hide high-frequency artifacts from // the bilinear upsample at the composite, while keeping the kernel's discrete tap count // small (≤3σ reach → ≈12 paired taps). // // The full table, in physical pixels (sigma_logical * dpi_scaling): // // sigma_phys ≤ 4 → factor = 1 (no downsample; source is sampled directly) // sigma_phys ≤ 8 → factor = 2 // sigma_phys > 8 → factor = 4 (capped) // // Capped at factor=4: master's preference for visual quality over bandwidth at the high end. // Larger factors (8 and 16) would lose more high-frequency detail than the kernel can mask // even with the H+V split, and the bandwidth saving is small (the work region also shrinks // quadratically, so most of the savings are already captured at factor=4). // // Working textures are sized at full swapchain resolution to support factor=1. Larger factors // just write to a smaller sub-rect via viewport-limited rendering. Memory cost: ½-res → full- // res working textures means 4× more bytes per working texture (2 textures, RGBA8: roughly // 16 MB at 1080p, 64 MB at 4K). On modern GPUs this is well within budget; on Mali Valhall // SBCs it's negligible against unified-memory headroom. // // The shaders read the factor as a uniform. The downsample shader has three paths (factor=1 // identity, factor=2 single bilinear tap, factor>=4 four bilinear taps with offsets scaling // by factor/4). The V-composite mode of backdrop_blur.frag uses inv_downsample_factor to // scale full-res frag coords down to working-res UV. // Maximum number of (weight, offset) pairs in a single blur kernel. Each pair represents // the linear-sampling pair adjustment (one bilinear fetch covering two adjacent texels); // pair[0] is the center weight with offset 0. With 32 pairs we cover up to 63 input texels // (1 center + 31 paired symmetric taps × 2 texels each), enough for sigma values well past // the 4..24 typical UI range. Must match MAX_KERNEL_PAIRS in shaders/source/backdrop_blur.frag. MAX_BACKDROP_KERNEL_PAIRS :: 32 // Backdrop_Primitive is the GPU-side per-primitive storage layout. Mirrors the GLSL std430 // struct in shaders/source/backdrop_blur.vert. Field order is chosen so std430 alignment // rules pack the struct to a clean 48-byte natural layout (no implicit padding): vec4 // members come first (16-byte aligned at any offset), then vec2, then scalars. The total is // a multiple of 16 so the std430 array stride matches size_of(...) exactly. // // Backdrop primitives are RRect-only: rectangles, rounded rectangles, and circles // (via uniform_radii) are all expressible. Rotation is intentionally omitted — backdrop // sampling is in screen space, so a rotated mask over a stationary blur sample would look // visually wrong. iOS, CSS backdrop-filter, and Flutter BackdropFilter all enforce this // implicitly; we enforce it explicitly by leaving no rotation field. // // Outline is also intentionally omitted. A specialized edge effect (e.g. liquid-glass-style // refraction outlines) would be implemented as a dedicated primitive type with its own // pipeline rather than tacked onto this one as a flag bit. Backdrop_Primitive :: struct { bounds: [4]f32, // 0: 16 — world-space quad (min_xy, max_xy) radii: [4]f32, // 16: 16 — per-corner radii in physical pixels (BR, TR, BL, TL) half_size: [2]f32, // 32: 8 — RRect half extents (physical px) half_feather: f32, // 40: 4 — feather_px * 0.5 (SDF anti-aliasing) color: Color, // 44: 4 — tint, packed RGBA u8x4 } #assert(size_of(Backdrop_Primitive) == 48) // --------------------------------------------------------------------------------------------------------------------- // ----- Uniform blocks ---------------- // --------------------------------------------------------------------------------------------------------------------- // Vertex uniforms for the unified blur PSO (mode 0 = H-blur, mode 1 = V-composite). // Matches the GLSL Uniforms block in shaders/source/backdrop_blur.vert. The downsample // PSO has no vertex uniforms. Backdrop_Vert_Uniforms :: struct { projection: matrix[4, 4]f32, // 0: 64 — screen-space ortho (mode 1 only; mode 0 ignores) dpi_scale: f32, // 64: 4 mode: u32, // 68: 4 — 0 = H-blur fullscreen tri; 1 = V-composite instanced quads _pad0: [2]f32, // 72: 8 — std140 vec4 alignment pad } // Fragment uniforms for the downsample PSO. Matches Uniforms block in // shaders/source/backdrop_downsample.frag. Backdrop_Downsample_Frag_Uniforms :: struct { inv_source_size: [2]f32, // 0: 8 — 1.0 / source_texture pixel dimensions (full-res) downsample_factor: u32, // 8: 4 — 2 or 4 (selects 1-tap vs 4-tap path in shader) _pad0: u32, // 12: 4 } // Fragment uniforms for the unified blur PSO (mode 0 + mode 1). Matches the GLSL Uniforms // block in shaders/source/backdrop_blur.frag. The kernel array holds the linear-sampling // pair coefficients computed CPU-side via `compute_blur_kernel`. Backdrop_Frag_Uniforms :: struct { inv_working_size: [2]f32, // 0: 8 — 1.0 / working-resolution texture dimensions pair_count: u32, // 8: 4 — number of (weight, offset) pairs; pair[0] is center mode: u32, // 12: 4 — 0 = H-blur, 1 = V-composite (must match vert mode) direction: [2]f32, // 16: 8 — (1,0) for H-blur, (0,1) for V-composite inv_downsample_factor: f32, // 24: 4 — 1.0 / downsample_factor (mode 1 only; mode 0 ignores) _pad0: f32, // 28: 4 kernel: [MAX_BACKDROP_KERNEL_PAIRS][4]f32, // 32: 512 — .x = weight, .y = offset (texels) } // --------------------------------------------------------------------------------------------------------------------- // ----- Pipeline --------------- // --------------------------------------------------------------------------------------------------------------------- Pipeline_2D_Backdrop :: struct { // Two graphics pipelines. The downsample PSO is a single-bilinear-sample fullscreen pass; // the blur PSO is mode-branched (H-blur fullscreen + V-composite instanced) and shares // one shader program for both modes via a uniform `mode` selector. downsample_pipeline: ^sdl.GPUGraphicsPipeline, blur_pipeline: ^sdl.GPUGraphicsPipeline, // Per-instance Backdrop_Primitive storage buffer. Grows on demand via grow_buffer_if_needed. // All backdrop primitives across all layers in a frame share this single buffer; sub-batches // reference into it by offset. primitive_buffer: Buffer, // Working textures, allocated once at swapchain resolution and recreated only on resize. // `source_texture` is full-resolution; the other two are ¼-res. All single-sample. // source_texture — when any backdrop draw exists this frame, the entire frame renders // here instead of the swapchain (Approach B). Copied to the swapchain // at frame end. Acts as the bracket's snapshot input by virtue of // already containing the pre-bracket frame. // downsample_texture — written by the downsample PSO. Read by the blur PSO in mode 0. // h_blur_texture — written by the blur PSO in mode 0. Read by the blur PSO in mode 1. source_texture: ^sdl.GPUTexture, downsample_texture: ^sdl.GPUTexture, h_blur_texture: ^sdl.GPUTexture, // Cached pixel dimensions for resize-detection in `ensure_backdrop_textures`. cached_width: u32, cached_height: u32, // Linear-clamp sampler used for all backdrop sampling. Linear filtering is required by the // linear-sampling pair trick (one bilinear fetch covers two adjacent texels). Clamp avoids // edge-bleed at the work-region boundary. sampler: ^sdl.GPUSampler, } @(private) create_pipeline_2d_backdrop :: proc( device: ^sdl.GPUDevice, window: ^sdl.Window, ) -> ( pipeline: Pipeline_2D_Backdrop, ok: bool, ) { // On failure, clean up any partially-created resources. defer if !ok { if pipeline.sampler != nil do sdl.ReleaseGPUSampler(device, pipeline.sampler) if pipeline.primitive_buffer.gpu != nil do destroy_buffer(device, &pipeline.primitive_buffer) if pipeline.blur_pipeline != nil do sdl.ReleaseGPUGraphicsPipeline(device, pipeline.blur_pipeline) if pipeline.downsample_pipeline != nil do sdl.ReleaseGPUGraphicsPipeline(device, pipeline.downsample_pipeline) } active_shader_formats := sdl.GetGPUShaderFormats(device) if PLATFORM_SHADER_FORMAT_FLAG not_in active_shader_formats { log.errorf( "backdrop: no embedded shader matches active GPU formats; build supports %v but device reports %v", PLATFORM_SHADER_FORMAT, active_shader_formats, ) return pipeline, false } swapchain_format := sdl.GetGPUSwapchainTextureFormat(device, window) //----- Shader modules ---------------------------------- fullscreen_vert := sdl.CreateGPUShader( device, sdl.GPUShaderCreateInfo { code_size = len(BACKDROP_FULLSCREEN_VERT_RAW), code = raw_data(BACKDROP_FULLSCREEN_VERT_RAW), entrypoint = SHADER_ENTRY, format = {PLATFORM_SHADER_FORMAT_FLAG}, stage = .VERTEX, }, ) if fullscreen_vert == nil { log.errorf("Could not create backdrop fullscreen vertex shader: %s", sdl.GetError()) return pipeline, false } defer sdl.ReleaseGPUShader(device, fullscreen_vert) downsample_frag := sdl.CreateGPUShader( device, sdl.GPUShaderCreateInfo { code_size = len(BACKDROP_DOWNSAMPLE_FRAG_RAW), code = raw_data(BACKDROP_DOWNSAMPLE_FRAG_RAW), entrypoint = SHADER_ENTRY, format = {PLATFORM_SHADER_FORMAT_FLAG}, stage = .FRAGMENT, num_samplers = 1, num_uniform_buffers = 1, }, ) if downsample_frag == nil { log.errorf("Could not create backdrop downsample fragment shader: %s", sdl.GetError()) return pipeline, false } defer sdl.ReleaseGPUShader(device, downsample_frag) blur_vert := sdl.CreateGPUShader( device, sdl.GPUShaderCreateInfo { code_size = len(BACKDROP_BLUR_VERT_RAW), code = raw_data(BACKDROP_BLUR_VERT_RAW), entrypoint = SHADER_ENTRY, format = {PLATFORM_SHADER_FORMAT_FLAG}, stage = .VERTEX, num_uniform_buffers = 1, num_storage_buffers = 1, }, ) if blur_vert == nil { log.errorf("Could not create backdrop blur vertex shader: %s", sdl.GetError()) return pipeline, false } defer sdl.ReleaseGPUShader(device, blur_vert) blur_frag := sdl.CreateGPUShader( device, sdl.GPUShaderCreateInfo { code_size = len(BACKDROP_BLUR_FRAG_RAW), code = raw_data(BACKDROP_BLUR_FRAG_RAW), entrypoint = SHADER_ENTRY, format = {PLATFORM_SHADER_FORMAT_FLAG}, stage = .FRAGMENT, num_samplers = 1, num_uniform_buffers = 1, }, ) if blur_frag == nil { log.errorf("Could not create backdrop blur fragment shader: %s", sdl.GetError()) return pipeline, false } defer sdl.ReleaseGPUShader(device, blur_frag) //----- Downsample PSO ---------------------------------- // Single bilinear sample, blend disabled. No vertex buffer (gl_VertexIndex 0..2 emits the // fullscreen triangle). Single-sample target (the ¼-res working textures are never MSAA). downsample_target := sdl.GPUColorTargetDescription { format = swapchain_format, blend_state = sdl.GPUColorTargetBlendState{enable_blend = false}, } pipeline.downsample_pipeline = sdl.CreateGPUGraphicsPipeline( device, sdl.GPUGraphicsPipelineCreateInfo { vertex_shader = fullscreen_vert, fragment_shader = downsample_frag, primitive_type = .TRIANGLELIST, multisample_state = sdl.GPUMultisampleState{sample_count = ._1}, target_info = sdl.GPUGraphicsPipelineTargetInfo { color_target_descriptions = &downsample_target, num_color_targets = 1, }, }, ) if pipeline.downsample_pipeline == nil { log.errorf("Failed to create backdrop downsample graphics pipeline: %s", sdl.GetError()) return pipeline, false } //----- Blur PSO (H-blur + V-composite, mode-branched) -------------- // Premultiplied-over blend matching the main pipeline. No vertex buffer (mode 0 uses // gl_VertexIndex 0..2 fullscreen tri; mode 1 uses gl_VertexIndex 0..5 unit-quad + // gl_InstanceIndex into the storage buffer). // // Single-sample throughout: levlib does not support MSAA (see init's doc comment in // draw.odin). The whole frame renders to single-sample targets, so sample_count = ._1 // matches both mode 0 (writes h_blur_texture) and mode 1 (writes source_texture). blur_target := sdl.GPUColorTargetDescription { format = swapchain_format, blend_state = sdl.GPUColorTargetBlendState { enable_blend = true, enable_color_write_mask = true, src_color_blendfactor = .ONE, dst_color_blendfactor = .ONE_MINUS_SRC_ALPHA, color_blend_op = .ADD, src_alpha_blendfactor = .ONE, dst_alpha_blendfactor = .ONE_MINUS_SRC_ALPHA, alpha_blend_op = .ADD, color_write_mask = sdl.GPUColorComponentFlags{.R, .G, .B, .A}, }, } pipeline.blur_pipeline = sdl.CreateGPUGraphicsPipeline( device, sdl.GPUGraphicsPipelineCreateInfo { vertex_shader = blur_vert, fragment_shader = blur_frag, primitive_type = .TRIANGLELIST, multisample_state = sdl.GPUMultisampleState{sample_count = ._1}, target_info = sdl.GPUGraphicsPipelineTargetInfo { color_target_descriptions = &blur_target, num_color_targets = 1, }, }, ) if pipeline.blur_pipeline == nil { log.errorf("Failed to create backdrop blur graphics pipeline: %s", sdl.GetError()) return pipeline, false } //----- Storage buffer for Backdrop_Primitive instances ------------- pipeline.primitive_buffer = create_buffer( device, size_of(Backdrop_Primitive) * BUFFER_INIT_SIZE, sdl.GPUBufferUsageFlags{.GRAPHICS_STORAGE_READ}, ) or_return //----- Sampler ---------------------------------- pipeline.sampler = sdl.CreateGPUSampler( device, sdl.GPUSamplerCreateInfo { min_filter = .LINEAR, mag_filter = .LINEAR, mipmap_mode = .LINEAR, address_mode_u = .CLAMP_TO_EDGE, address_mode_v = .CLAMP_TO_EDGE, address_mode_w = .CLAMP_TO_EDGE, }, ) if pipeline.sampler == nil { log.errorf("Could not create backdrop GPU sampler: %s", sdl.GetError()) return pipeline, false } log.debug("Done creating backdrop pipeline") return pipeline, true } @(private) destroy_pipeline_2d_backdrop :: proc(device: ^sdl.GPUDevice, pipeline: ^Pipeline_2D_Backdrop) { if pipeline.h_blur_texture != nil do sdl.ReleaseGPUTexture(device, pipeline.h_blur_texture) if pipeline.downsample_texture != nil do sdl.ReleaseGPUTexture(device, pipeline.downsample_texture) if pipeline.source_texture != nil do sdl.ReleaseGPUTexture(device, pipeline.source_texture) if pipeline.sampler != nil do sdl.ReleaseGPUSampler(device, pipeline.sampler) destroy_buffer(device, &pipeline.primitive_buffer) if pipeline.blur_pipeline != nil do sdl.ReleaseGPUGraphicsPipeline(device, pipeline.blur_pipeline) if pipeline.downsample_pipeline != nil do sdl.ReleaseGPUGraphicsPipeline(device, pipeline.downsample_pipeline) } // --------------------------------------------------------------------------------------------------------------------- // ----- Working texture management ---- // --------------------------------------------------------------------------------------------------------------------- // Allocate (or reallocate, on resize) the three working textures that the backdrop bracket // uses. `source_texture` is full swapchain resolution; the other two are ¼-res. All single- // sample, all share the swapchain format, all need {.COLOR_TARGET, .SAMPLER} usage so they // can be written by render passes and read by subsequent passes. // // Recreates on dimension change only — same-size frames hit the early-out and skip GPU // resource churn. @(private) ensure_backdrop_textures :: proc(device: ^sdl.GPUDevice, format: sdl.GPUTextureFormat, width, height: u32) { pipeline := &GLOB.pipeline_2d_backdrop if pipeline.source_texture != nil && pipeline.cached_width == width && pipeline.cached_height == height { return } // Free any prior allocations (handles resize and the very-first call where these are nil). if pipeline.h_blur_texture != nil { sdl.ReleaseGPUTexture(device, pipeline.h_blur_texture) pipeline.h_blur_texture = nil } if pipeline.downsample_texture != nil { sdl.ReleaseGPUTexture(device, pipeline.downsample_texture) pipeline.downsample_texture = nil } if pipeline.source_texture != nil { sdl.ReleaseGPUTexture(device, pipeline.source_texture) pipeline.source_texture = nil } // Working textures are sized at full swapchain resolution to support factor=1 (no downsample // for small σ, where any 2:1 round-trip would visibly soften the output). Larger factors just // write to a sub-rect via viewport-limited rendering. See the file-header comment. working_width := width working_height := height pipeline.source_texture = sdl.CreateGPUTexture( device, sdl.GPUTextureCreateInfo { type = .D2, format = format, usage = {.COLOR_TARGET, .SAMPLER}, width = width, height = height, layer_count_or_depth = 1, num_levels = 1, sample_count = ._1, }, ) if pipeline.source_texture == nil { log.panicf("Failed to create backdrop source texture (%dx%d): %s", width, height, sdl.GetError()) } pipeline.downsample_texture = sdl.CreateGPUTexture( device, sdl.GPUTextureCreateInfo { type = .D2, format = format, usage = {.COLOR_TARGET, .SAMPLER}, width = working_width, height = working_height, layer_count_or_depth = 1, num_levels = 1, sample_count = ._1, }, ) if pipeline.downsample_texture == nil { log.panicf( "Failed to create backdrop downsample texture (%dx%d): %s", working_width, working_height, sdl.GetError(), ) } pipeline.h_blur_texture = sdl.CreateGPUTexture( device, sdl.GPUTextureCreateInfo { type = .D2, format = format, usage = {.COLOR_TARGET, .SAMPLER}, width = working_width, height = working_height, layer_count_or_depth = 1, num_levels = 1, sample_count = ._1, }, ) if pipeline.h_blur_texture == nil { log.panicf( "Failed to create backdrop h_blur texture (%dx%d): %s", working_width, working_height, sdl.GetError(), ) } pipeline.cached_width = width pipeline.cached_height = height } // --------------------------------------------------------------------------------------------------------------------- // ----- Kernel computation ------------ // --------------------------------------------------------------------------------------------------------------------- // Compute Gaussian blur kernel weights with the linear-sampling pair adjustment. // Adapted from RAD Debugger's r_d3d11_g_blur_shader_src CPU-side coefficient generation // and Daniel Rákos's "Efficient Gaussian blur with linear sampling" article. // // The trick: bilinear sampling lets us fetch (1-t)*pixel[i] + t*pixel[i+1] with a single // texture lookup. So for any pair of adjacent discrete weights w0, w1 we can collapse them // into one bilinear fetch with weight w = w0+w1 sampled at offset i + w1/w. This halves the // fragment-shader sample count for a given kernel radius. // // Output: `kernel[0]` is the center weight (offset 0), and `kernel[1..pair_count-1]` each // hold one paired tap (sampled symmetrically as ±offset in the shader). The shader iterates // `i in [1, pair_count)` and does two texture fetches per pair — one at +offset, one at // -offset — for a total of 1 + 2*(pair_count-1) bilinear fetches per fragment. // // `sigma` is the true Gaussian standard deviation in the kernel's working-space units (¼-res // texels, after the caller has converted from logical pixels via dpi_scaling and the // downsample factor). The kernel extent reaches ±3σ, capturing 99.7% of the Gaussian's // mass; weights beyond that contribute imperceptibly. sigma <= 0 produces a degenerate // kernel `{1, 0}` that acts as a sharp pass-through. After the loop, the discrete weights // are normalized so they sum to 1.0 (truncating at ±3σ loses a tiny amount of mass; we // renormalize to preserve overall image brightness). // // Earlier versions of this routine ported RAD Debugger's algorithm verbatim, which derives // stdev from a tap-count parameter (`stdev = (blur_count-1)/2`). That made the parameter // name misleading: the user thought they were passing σ but were actually passing // half-kernel-width. This version takes σ directly and derives the tap count from it, // matching what callers expect when they read "gaussian_sigma". @(private) compute_blur_kernel :: proc(sigma: f32, kernel: ^[MAX_BACKDROP_KERNEL_PAIRS][4]f32) -> (pair_count: u32) { if sigma <= 0 { kernel[0] = {1, 0, 0, 0} return 1 } // Per-side discrete tap count: ceil(3*sigma) + 1 (center + 3σ reach on each side). // Cap at the storage budget. With MAX_BACKDROP_KERNEL_PAIRS=32 each pair collapses 2 // discrete taps via linear-sampling, so max discrete taps per side = 1 + 31*2 = 63. discrete_taps := u32(math.ceil(3 * sigma)) + 1 max_taps := u32(MAX_BACKDROP_KERNEL_PAIRS - 1) * 2 + 1 if discrete_taps > max_taps do discrete_taps = max_taps if discrete_taps < 2 { // Sigma was so small that 3σ < 1 texel; degenerate to a sharp sample. kernel[0] = {1, 0, 0, 0} return 1 } // Compute discrete weights[i] = exp(-i² / (2σ²)). The inv_root prefactor cancels in the // final normalization, so we skip it. weights: [MAX_BACKDROP_KERNEL_PAIRS * 2]f32 = {} two_sigma_sq := 2 * sigma * sigma total: f32 = 0 for i in 0 ..< discrete_taps { x := f32(i) weights[i] = math.exp(-x * x / two_sigma_sq) // weights[0] is the center; weights[1..] are sampled on both sides, so they count twice. total += weights[i] if i == 0 else 2 * weights[i] } // Normalize so the kernel sums to exactly 1.0 across the full ±3σ extent. if total > 0 { inv_total := 1.0 / total for i in 0 ..< discrete_taps do weights[i] *= inv_total } // Linear-sampling pair adjustment: weights[1] and weights[2] collapse to one bilinear // fetch with weight w = w0+w1 at offset i + w1/w. `weights` is sized 2*MAX so that // `weights[i+1]` access on odd i up to discrete_taps-1 is always in bounds. kernel[0] = {weights[0], 0, 0, 0} pair_count = 1 for i := u32(1); i < discrete_taps; i += 2 { w0 := weights[i] w1 := weights[i + 1] w := w0 + w1 // Guard against a div-by-zero where both adjacent weights underflow to 0 (only happens // at the tail of a very tight kernel; numerically-degenerate but legal). offset := f32(i) if w > 0 do offset = f32(i) + w1 / w kernel[pair_count] = {w, offset, 0, 0} pair_count += 1 } return pair_count } // --------------------------------------------------------------------------------------------------------------------- // ----- Uniform push helpers ---------- // --------------------------------------------------------------------------------------------------------------------- // Push the Backdrop_Vert_Uniforms block to the vertex stage at slot 0. @(private) push_backdrop_vert_globals :: proc(cmd_buffer: ^sdl.GPUCommandBuffer, width: f32, height: f32, mode: u32) { uniforms := Backdrop_Vert_Uniforms { projection = ortho_rh(left = 0.0, top = 0.0, right = width, bottom = height, near = -1.0, far = 1.0), dpi_scale = GLOB.dpi_scaling, mode = mode, } sdl.PushGPUVertexUniformData(cmd_buffer, 0, &uniforms, size_of(Backdrop_Vert_Uniforms)) } // Push the Backdrop_Downsample_Frag_Uniforms block to the fragment stage at slot 0. @(private) push_backdrop_downsample_frag_globals :: proc( cmd_buffer: ^sdl.GPUCommandBuffer, source_width, source_height: u32, downsample_factor: u32, ) { uniforms := Backdrop_Downsample_Frag_Uniforms { inv_source_size = {1.0 / f32(source_width), 1.0 / f32(source_height)}, downsample_factor = downsample_factor, } sdl.PushGPUFragmentUniformData(cmd_buffer, 0, &uniforms, size_of(Backdrop_Downsample_Frag_Uniforms)) } // Pick a downsample factor for a given sigma. See the file-header comment for the table and // rationale. Returned values: {1, 2, 4}. @(private) compute_backdrop_downsample_factor :: proc(sigma_logical: f32) -> u32 { sigma_phys := sigma_logical * GLOB.dpi_scaling switch { case sigma_phys <= 4: return 1 case sigma_phys <= 8: return 2 case: return 4 } } // Push the Backdrop_Frag_Uniforms block (kernel + pass mode/direction) to the fragment stage at slot 0. @(private) push_backdrop_blur_frag_globals :: proc( cmd_buffer: ^sdl.GPUCommandBuffer, uniforms: ^Backdrop_Frag_Uniforms, ) { sdl.PushGPUFragmentUniformData(cmd_buffer, 0, uniforms, size_of(Backdrop_Frag_Uniforms)) } // --------------------------------------------------------------------------------------------------------------------- // ----- Storage-buffer upload --------- // --------------------------------------------------------------------------------------------------------------------- // Upload all Backdrop_Primitive instances staged this frame to the backdrop pipeline's storage // buffer. Mirrors the SDF primitive upload in pipeline_2d_base.odin's `upload`. Called from // `end()` inside the same copy pass that uploads vertices/indices/SDF primitives. @(private) upload_backdrop_primitives :: proc(device: ^sdl.GPUDevice, pass: ^sdl.GPUCopyPass) { prim_count := u32(len(GLOB.tmp_backdrop_primitives)) if prim_count == 0 do return prim_size := prim_count * size_of(Backdrop_Primitive) grow_buffer_if_needed( device, &GLOB.pipeline_2d_backdrop.primitive_buffer, prim_size, sdl.GPUBufferUsageFlags{.GRAPHICS_STORAGE_READ}, ) prim_array := sdl.MapGPUTransferBuffer(device, GLOB.pipeline_2d_backdrop.primitive_buffer.transfer, false) if prim_array == nil { log.panicf("Failed to map backdrop primitive transfer buffer: %s", sdl.GetError()) } mem.copy(prim_array, raw_data(GLOB.tmp_backdrop_primitives), int(prim_size)) sdl.UnmapGPUTransferBuffer(device, GLOB.pipeline_2d_backdrop.primitive_buffer.transfer) sdl.UploadToGPUBuffer( pass, sdl.GPUTransferBufferLocation{transfer_buffer = GLOB.pipeline_2d_backdrop.primitive_buffer.transfer}, sdl.GPUBufferRegion{buffer = GLOB.pipeline_2d_backdrop.primitive_buffer.gpu, offset = 0, size = prim_size}, false, ) } // --------------------------------------------------------------------------------------------------------------------- // ----- Frame / layer scanners -------- // --------------------------------------------------------------------------------------------------------------------- // Returns true if any sub-batch in any layer this frame is .Backdrop kind. Called once at the // top of `end()` to decide whether to route the whole frame to source_texture (Approach B). // O(total sub-batches) but with an early-exit on the first hit, so typical cost is tiny. @(private) frame_has_backdrop :: proc() -> bool { for &batch in GLOB.tmp_sub_batches { if batch.kind == .Backdrop do return true } return false } // Returns the absolute index of the first .Backdrop sub-batch in the layer's sub-batch range, // or -1 if the layer has no backdrops. The index is into GLOB.tmp_sub_batches (not relative to // layer.sub_batch_start), to match how draw_layer's render-range helpers consume it. @(private) find_first_backdrop_in_layer :: proc(layer: ^Layer) -> int { for i in 0 ..< layer.sub_batch_len { abs_idx := layer.sub_batch_start + i if GLOB.tmp_sub_batches[abs_idx].kind == .Backdrop do return int(abs_idx) } return -1 } // --------------------------------------------------------------------------------------------------------------------- // ----- Bracket scheduler ------------- // --------------------------------------------------------------------------------------------------------------------- // Compute the union AABB of the backdrop primitives in a contiguous-same-sigma sub-batch run // (one "sigma group"), expanded by 6 sigmas of blur reach (the kernel weight beyond 3σ is // negligible; halo of 6σ covers both the H-blur reads from downsample and the V-blur reads // from h_blur, since each pass extends its kernel another 3σ from its output position). // Returns a viewport in physical pixels for the full-resolution render target; the caller // divides by the chosen downsample factor for the working-resolution passes. // // Per-group (rather than per-layer) because the adaptive downsample picks a different factor // per sigma, and the kernel reach is also per-sigma. A tighter region per group means less // fragment work in the downsample and H-blur passes. @(private) compute_backdrop_group_work_region :: proc( group_start, group_end: u32, sigma_logical: f32, swapchain_width, swapchain_height: u32, ) -> ( region_x, region_y, region_w, region_h: u32, ) { dpi := GLOB.dpi_scaling has_any := false min_x: f32 = 0 min_y: f32 = 0 max_x: f32 = 0 max_y: f32 = 0 for i in group_start ..< group_end { batch := GLOB.tmp_sub_batches[i] if batch.kind != .Backdrop do continue for p in batch.offset ..< batch.offset + batch.count { prim := GLOB.tmp_backdrop_primitives[p] // prim.bounds is in logical pixels (world space). if !has_any { min_x = prim.bounds[0] min_y = prim.bounds[1] max_x = prim.bounds[2] max_y = prim.bounds[3] has_any = true } else { if prim.bounds[0] < min_x do min_x = prim.bounds[0] if prim.bounds[1] < min_y do min_y = prim.bounds[1] if prim.bounds[2] > max_x do max_x = prim.bounds[2] if prim.bounds[3] > max_y do max_y = prim.bounds[3] } } } if !has_any do return 0, 0, 0, 0 // Halo = 6σ. The bracket runs two sequential blur passes (H then V). H reads downsample // at ±3σ from its output; V reads h_blur at ±3σ from its output. So for V outputs at // primitive_AABB to be valid, h_blur must be valid at primitive_AABB ±3σ, which requires // the downsample valid at primitive_AABB ±6σ. halo_logical := 6.0 * sigma_logical min_x -= halo_logical min_y -= halo_logical max_x += halo_logical max_y += halo_logical // Convert to physical pixels and clamp to swapchain bounds. phys_min_x := math.max(min_x * dpi, 0) phys_min_y := math.max(min_y * dpi, 0) phys_max_x := math.min(max_x * dpi, f32(swapchain_width)) phys_max_y := math.min(max_y * dpi, f32(swapchain_height)) if phys_max_x <= phys_min_x || phys_max_y <= phys_min_y do return 0, 0, 0, 0 region_x = u32(phys_min_x) region_y = u32(phys_min_y) region_w = u32(phys_max_x - phys_min_x) region_h = u32(phys_max_y - phys_min_y) return } // Run the backdrop bracket for one layer. Assumes: // - source_texture currently holds the pre-bracket frame contents (Pass A has already // rendered everything that should appear behind the backdrop). // - The caller has invoked ensure_backdrop_textures with current swapchain dimensions. // - At least one .Backdrop sub-batch exists in the layer (caller checked). // // Per-sigma-group execution. The bracket walks the layer's sub-batches in submission order, // grouping contiguous-same-sigma .Backdrop sub-batches. For each group: // 1. Pick a downsample factor using compute_backdrop_downsample_factor. // 2. Compute that group's work region (primitives' AABB + 6σ halo, clamped). // 3. Downsample: source_texture → downsample_texture, viewport-limited to // work_region/factor. Writes into a sub-rect of the working texture. // 4. H-blur (mode 0, direction=H): downsample_texture → h_blur_texture, same viewport. // 5. V-blur (mode 0, direction=V): h_blur_texture → downsample_texture (ping-pong reuse; // downsample_texture's data is no longer needed). Same viewport. // 6. Composite (mode 1): downsample_texture (now holds H+V blur) → source_texture, full- // target viewport, per-primitive SDF discard handles masking and applies the tint. Each // sub-batch in the group is one instanced draw. // // V-blur was historically combined with the composite into a single shader invocation, but // that produced a horizontal-vs-vertical asymmetry artifact (horizontal source features // looked sharper than vertical ones inside the panel). Splitting V-blur into its own // working→working pass restores symmetry by making H and V blurs structurally identical. // // On exit, source_texture contains the pre-bracket contents plus all backdrop primitives // composited on top. The caller then runs Pass B (post-bracket non-backdrop sub-batches) on // source_texture with LOAD. @(private) run_backdrop_bracket :: proc( cmd_buffer: ^sdl.GPUCommandBuffer, layer: ^Layer, swapchain_width, swapchain_height: u32, ) { pipeline := &GLOB.pipeline_2d_backdrop full_viewport := sdl.GPUViewport { x = 0, y = 0, w = f32(swapchain_width), h = f32(swapchain_height), min_depth = 0, max_depth = 1, } full_scissor := sdl.Rect { x = 0, y = 0, w = i32(swapchain_width), h = i32(swapchain_height), } // Working textures are at full swapchain resolution. Each per-group factor=N pass writes // only to a sub-rect of dimensions (work_region_phys / N), via viewport-limited rendering. layer_end := layer.sub_batch_start + layer.sub_batch_len i := layer.sub_batch_start for i < layer_end { batch := GLOB.tmp_sub_batches[i] if batch.kind != .Backdrop { i += 1 continue } // Find the contiguous run of .Backdrop sub-batches with this sigma. sigma := batch.gaussian_sigma group_start := i group_end := i + 1 for group_end < layer_end { next := GLOB.tmp_sub_batches[group_end] if next.kind != .Backdrop || next.gaussian_sigma != sigma do break group_end += 1 } // Pick downsample factor for this group. downsample_factor := compute_backdrop_downsample_factor(sigma) // Compute this group's work region (primitive AABB + 6σ halo, in physical pixels). region_x, region_y, region_w, region_h := compute_backdrop_group_work_region( group_start, group_end, sigma, swapchain_width, swapchain_height, ) if region_w == 0 || region_h == 0 { i = group_end continue } // Convert work region to working-resolution coords (divide by factor, ceil-round-up). working_x := region_x / downsample_factor working_y := region_y / downsample_factor working_w := (region_w + downsample_factor - 1) / downsample_factor working_h := (region_h + downsample_factor - 1) / downsample_factor // Working textures are sized at min factor (2). At factor=4 we have only half the texture // area available in each axis. Clamp to the texture extent for either case. wt_w := pipeline.cached_width / downsample_factor wt_h := pipeline.cached_height / downsample_factor if working_x + working_w > wt_w do working_w = wt_w - working_x if working_y + working_h > wt_h do working_h = wt_h - working_y if working_w == 0 || working_h == 0 { i = group_end continue } working_viewport := sdl.GPUViewport { x = f32(working_x), y = f32(working_y), w = f32(working_w), h = f32(working_h), min_depth = 0, max_depth = 1, } working_scissor := sdl.Rect { x = i32(working_x), y = i32(working_y), w = i32(working_w), h = i32(working_h), } // inv_working_size is always relative to the actual texture extent (full swapchain res). // At factor>1 we're only using a sub-rect, but the texture coords are still divided by the // full texture's dimensions because that's what gl_FragCoord operates on. inv_working_size := [2]f32{1.0 / f32(pipeline.cached_width), 1.0 / f32(pipeline.cached_height)} // Convert the user's logical-pixel sigma into the kernel's working space. // sigma_working_texels = sigma_logical * dpi_scaling / downsample_factor. effective_sigma := sigma * GLOB.dpi_scaling / f32(downsample_factor) frag_uniforms := Backdrop_Frag_Uniforms { inv_working_size = inv_working_size, inv_downsample_factor = 1.0 / f32(downsample_factor), } frag_uniforms.pair_count = compute_blur_kernel(effective_sigma, &frag_uniforms.kernel) //----- Downsample (source_texture → downsample_texture, viewport-limited) ---------- { pass := sdl.BeginGPURenderPass( cmd_buffer, &sdl.GPUColorTargetInfo { texture = pipeline.downsample_texture, load_op = .DONT_CARE, store_op = .STORE, cycle = true, }, 1, nil, ) sdl.BindGPUGraphicsPipeline(pass, pipeline.downsample_pipeline) sdl.SetGPUViewport(pass, working_viewport) sdl.SetGPUScissor(pass, working_scissor) push_backdrop_downsample_frag_globals( cmd_buffer, pipeline.cached_width, pipeline.cached_height, downsample_factor, ) sdl.BindGPUFragmentSamplers( pass, 0, &sdl.GPUTextureSamplerBinding{texture = pipeline.source_texture, sampler = pipeline.sampler}, 1, ) sdl.DrawGPUPrimitives(pass, 3, 1, 0, 0) sdl.EndGPURenderPass(pass) } //----- H-blur (mode 0, direction=H): downsample_texture → h_blur_texture -------- { frag_uniforms.mode = 0 frag_uniforms.direction = {1, 0} pass := sdl.BeginGPURenderPass( cmd_buffer, &sdl.GPUColorTargetInfo { texture = pipeline.h_blur_texture, load_op = .DONT_CARE, store_op = .STORE, cycle = true, }, 1, nil, ) sdl.BindGPUGraphicsPipeline(pass, pipeline.blur_pipeline) sdl.SetGPUViewport(pass, working_viewport) sdl.SetGPUScissor(pass, working_scissor) // Mode 0's vertex shader is a fullscreen triangle that ignores `projection`; pass // the standard ortho anyway so the same uniform block works for both modes. push_backdrop_vert_globals(cmd_buffer, f32(swapchain_width), f32(swapchain_height), 0) push_backdrop_blur_frag_globals(cmd_buffer, &frag_uniforms) // The blur PSO is declared with num_storage_buffers = 1 (mode 1 reads it). SDL3 GPU // validation requires the binding to be present for *any* draw on this PSO, even // though mode 0's shader path doesn't actually read it. Bind it here too. sdl.BindGPUVertexStorageBuffers(pass, 0, ([^]^sdl.GPUBuffer)(&pipeline.primitive_buffer.gpu), 1) sdl.BindGPUFragmentSamplers( pass, 0, &sdl.GPUTextureSamplerBinding{texture = pipeline.downsample_texture, sampler = pipeline.sampler}, 1, ) sdl.DrawGPUPrimitives(pass, 3, 1, 0, 0) sdl.EndGPURenderPass(pass) } //----- V-blur (mode 0, direction=V): h_blur_texture → downsample_texture -------- // Ping-pong reuse: downsample_texture's data is no longer needed once H-blur has // produced its output, so we reuse it as the V-blur target. Saves allocating a third // working texture. { frag_uniforms.mode = 0 frag_uniforms.direction = {0, 1} pass := sdl.BeginGPURenderPass( cmd_buffer, &sdl.GPUColorTargetInfo { texture = pipeline.downsample_texture, load_op = .DONT_CARE, store_op = .STORE, cycle = true, }, 1, nil, ) sdl.BindGPUGraphicsPipeline(pass, pipeline.blur_pipeline) sdl.SetGPUViewport(pass, working_viewport) sdl.SetGPUScissor(pass, working_scissor) push_backdrop_vert_globals(cmd_buffer, f32(swapchain_width), f32(swapchain_height), 0) push_backdrop_blur_frag_globals(cmd_buffer, &frag_uniforms) sdl.BindGPUVertexStorageBuffers(pass, 0, ([^]^sdl.GPUBuffer)(&pipeline.primitive_buffer.gpu), 1) sdl.BindGPUFragmentSamplers( pass, 0, &sdl.GPUTextureSamplerBinding{texture = pipeline.h_blur_texture, sampler = pipeline.sampler}, 1, ) sdl.DrawGPUPrimitives(pass, 3, 1, 0, 0) sdl.EndGPURenderPass(pass) } //----- Composite (mode 1): downsample_texture (now holds H+V blur) → source_texture -- // No kernel applied here — the working texture is already fully blurred. The shader just // upsamples (via bilinear filtering on the read), applies the SDF mask, and applies the // tint. One render pass for the whole sigma group; each sub-batch issues its own draw // call because non-contiguous-but-same-sigma sub-batches couldn't coalesce upstream. { frag_uniforms.mode = 1 // direction is unused in mode 1 but keep it set so reading the uniform doesn't see // undefined data on platforms that care about that. frag_uniforms.direction = {0, 0} pass := sdl.BeginGPURenderPass( cmd_buffer, &sdl.GPUColorTargetInfo{texture = pipeline.source_texture, load_op = .LOAD, store_op = .STORE}, 1, nil, ) sdl.BindGPUGraphicsPipeline(pass, pipeline.blur_pipeline) sdl.SetGPUViewport(pass, full_viewport) sdl.SetGPUScissor(pass, full_scissor) push_backdrop_vert_globals(cmd_buffer, f32(swapchain_width), f32(swapchain_height), 1) push_backdrop_blur_frag_globals(cmd_buffer, &frag_uniforms) sdl.BindGPUVertexStorageBuffers(pass, 0, ([^]^sdl.GPUBuffer)(&pipeline.primitive_buffer.gpu), 1) sdl.BindGPUFragmentSamplers( pass, 0, &sdl.GPUTextureSamplerBinding{texture = pipeline.downsample_texture, sampler = pipeline.sampler}, 1, ) for j in group_start ..< group_end { grp := GLOB.tmp_sub_batches[j] sdl.DrawGPUPrimitives(pass, 6, grp.count, 0, grp.offset) } sdl.EndGPURenderPass(pass) } i = group_end } } // --------------------------------------------------------------------------------------------------------------------- // ----- Primitive builders ------------ // --------------------------------------------------------------------------------------------------------------------- // Internal // // Build a Backdrop_Primitive with bounds, radii, and feather computed from rectangle // geometry. The caller sets `color` (tint) on the returned primitive before submitting. // // No rotation, no outline — backdrop primitives are intentionally limited to axis-aligned // RRects in v1. Rotation breaks screen-space blur sampling visually; outline would be a // specialized edge effect that belongs in its own primitive type. @(private) build_backdrop_primitive :: proc( rect: Rectangle, radii: Rectangle_Radii, feather_px: f32, ) -> Backdrop_Primitive { max_radius := min(rect.width, rect.height) * 0.5 clamped_top_left := clamp(radii.top_left, 0, max_radius) clamped_top_right := clamp(radii.top_right, 0, max_radius) clamped_bottom_right := clamp(radii.bottom_right, 0, max_radius) clamped_bottom_left := clamp(radii.bottom_left, 0, max_radius) half_feather := feather_px * 0.5 padding := half_feather / GLOB.dpi_scaling dpi_scale := GLOB.dpi_scaling half_width := rect.width * 0.5 half_height := rect.height * 0.5 center_x := rect.x + half_width center_y := rect.y + half_height return Backdrop_Primitive { bounds = { center_x - half_width - padding, center_y - half_height - padding, center_x + half_width + padding, center_y + half_height + padding, }, // Radii ordering matches the shader's sdRoundedBox swizzle: // (p.x > 0) ? r.xy : r.zw picks right-vs-left half // then (p.y > 0) ? rxy.x : rxy.y picks bottom-vs-top within that half // So slot 0 = bottom-right, slot 1 = top-right, slot 2 = bottom-left, slot 3 = top-left. radii = { clamped_bottom_right * dpi_scale, clamped_top_right * dpi_scale, clamped_bottom_left * dpi_scale, clamped_top_left * dpi_scale, }, half_size = {half_width * dpi_scale, half_height * dpi_scale}, half_feather = half_feather, } } // Internal — append a Backdrop_Primitive to the staging array and emit a .Backdrop sub-batch // carrying the requested gaussian_sigma. Sub-batch coalescing in append_or_extend_sub_batch // will merge contiguous backdrops that share a sigma into a single instanced draw. @(private) prepare_backdrop_primitive :: proc(layer: ^Layer, prim: Backdrop_Primitive, gaussian_sigma: f32) { offset := u32(len(GLOB.tmp_backdrop_primitives)) append(&GLOB.tmp_backdrop_primitives, prim) scissor := &GLOB.scissors[layer.scissor_start + layer.scissor_len - 1] append_or_extend_sub_batch( scissor, layer, .Backdrop, offset = offset, count = 1, gaussian_sigma = gaussian_sigma, ) } // --------------------------------------------------------------------------------------------------------------------- // ----- Public API -------------------- // --------------------------------------------------------------------------------------------------------------------- // Draw a rectangle whose interior samples a Gaussian-blurred snapshot of the framebuffer // behind it. RRect-only — covers rectangles, rounded rectangles, and circles via // uniform_radii. // // `gaussian_sigma` is the Gaussian standard deviation in logical pixels. Typical UI range is // 4..24. sigma <= 0 produces a sharp framebuffer mirror (no blur). // // `tint` controls the color of the frosted glass: // - tint.rgb is the tint color. // - tint.a is the tint *mix strength*, NOT panel opacity. The panel is always fully // opaque inside its mask (matching real frosted glass and iOS UIBlurEffect / CSS // backdrop-filter). At alpha=0 the user sees the pure blur unchanged; at alpha=255 // the blur is fully multiplied by tint.rgb. Intermediate values lerp between the two. // - For a translucent panel layered over content, draw a separate translucent rect on // top instead — the backdrop's job is to deliver the blur, not to blend with what's // beneath it. // // Backdrop primitives have no rotation: backdrop sampling is in screen space, so a rotated // mask over a stationary blur sample would look visually wrong. iOS UIVisualEffectView, // CSS backdrop-filter, and Flutter BackdropFilter all enforce this implicitly; we enforce // it explicitly by leaving no rotation parameter. // // Within a single layer, primitives sharing the same `gaussian_sigma` share one H+V blur // pass pair via sub-batch coalescing. Primitives with different sigmas in the same layer // trigger separate blur passes (cost scales with the number of unique sigmas). // // Submission ordering is asymmetric: a non-backdrop draw submitted between two backdrops in // the same layer renders *on top of* both backdrops, not between them. Use `draw.new_layer` // to interleave. See README.md § "Backdrop pipeline" for the full bracket scheduling model. gaussian_blur :: proc( layer: ^Layer, rect: Rectangle, gaussian_sigma: f32, tint: Color = DFT_TINT, radii: Rectangle_Radii = {}, feather_px: f32 = DFT_FEATHER_PX, ) { prim := build_backdrop_primitive(rect, radii, feather_px) prim.color = tint prepare_backdrop_primitive(layer, prim, gaussian_sigma) }