Uniform Module Architecture

The uniform module is intentionally small: a single UniformBuffer class, a couple of type helpers, and a handful of functions in UniformUtils. Together they enforce WebGPU’s uniform-buffer layout constraints and keep CPU↔GPU transfers predictable.

Design goals

Cache-first �?keep an ArrayBuffer copy of the uniform data on the CPU so multiple updates can occur before one queue.writeBuffer call.
Shared layouts �?expose UniformBuffer.bindGroupLayout() so every pipeline can reuse the same uniform bind group layout (@binding(0) uniform buffer, visible in VS/FS/CS).
Alignment guarantees �?provide helpers (UniformUtils.alignSize, packVec, packMat4) so camera/render-setting structs always obey WebGPU’s 16-byte rules.
Minimal surface area �?no fancy pooling or mapping; just construct once and reuse.

Data lifecycle

` new UniformBuffer(device, initBytes) �? ├─ creates GPU buffer (UNIFORM | COPY_DST) ├─ copies init bytes into CPU cache (_data) └─ uploads initial contents with queue.writeBuffer

setData(view) �?copy bytes into _data �?mark dirty (implicitly)

flush(device?) �?dev.queue.writeBuffer(buffer, 0, _data)

destroy() �?buffer.destroy() `

data getter returns a cloned ArrayBuffer for inspection/debugging.
dataBytes setter replaces the cache wholesale (used when ONNX overwrites model param uniforms via staging buffers).
clone() simply constructs another UniformBuffer with the cached bytes.

Memory layout notes

WebGPU requires uniform buffers to be 16-byte aligned and to respect std140-like padding:

Scalars occupy 4 bytes but still live inside 16-byte slots if in a struct.
ec2 �?8 bytes, ec3 �?padded to 16 bytes (treated as ec4).
mat4x4 is stored column-major, 64 bytes total.

UniformUtils takes care of these details:

s UniformUtils.alignSize(100); // �?112 UniformUtils.packVec([1,2,3], 3); // �?Float32Array([1,2,3,0]) UniformUtils.packMat4(matrix16); // ensures 16 elements UniformUtils.createAlignedBuffer(n); // returns ArrayBuffer padded to 16 bytes

Bind group layout

UniformBuffer.bindGroupLayout(device) always returns the same layout:

s @group(N) @binding(0) var<uniform> ...;

Visibility bits cover VS/FS/CS so the same layout works for preprocess, sort, and render pipelines. Each UniformBuffer constructs its own indGroup using that shared layout.

Integration patterns

Camera + render settings

GaussianRenderer and GaussianPreprocessor each own two uniforms:

cameraUniforms (272 bytes: view, viewInv, proj, projInv, viewport, fov)
settingsUniforms (~80 bytes: clipping boxes, gaussian scaling, SH degree, toggles)

Before encoding compute or render passes they call setData(...) followed by lush().

Model params

Every PointCloud constructs modelParamsUniforms = new UniformBuffer(device, 128-byte struct) that mirrors the WGSL expectation (transform, offsets, scaling, precision metadata). Preprocess updates this buffer via updateModelParamsBuffer or updateModelParamsWithOffset and flushes before dispatching compute.

ONNX counters

Dynamic point clouds pass an extra countBuffer; preprocess flushes the model params uniform, then uses copyBufferToBuffer to overwrite the um_points slot at byte offset 68. Because the uniform cache mirrors GPU data, the renderer can later inspect modelParamsUniforms.data for debugging.

Error checking

setData throws if iew.byteLength !== size, preventing accidental partial writes.
dataBytes setter also enforces the same size check.
packMat4 throws unless exactly 16 elements are supplied.

Future hooks

The current implementation is intentionally barebones. If we ever need buffer pooling or persistent mapping we can extend UniformBuffer to accept a custom usage flag (already supported via UniformConfig) and add a UniformPool. For now, the simplicity keeps per-frame uniform updates trivial and easy to reason about.