Shaders Module API Reference
This reference focuses on the WGSL entry points and data layouts used by Visionary. Each section lists the relevant structs, bindings, and specialization options so you can modify or extend the shaders safely.
1. Preprocess (preprocess.wgsl)
Entry point
wgsl
@compute @workgroup_size(256, 1, 1)
fn preprocess(@builtin(global_invocation_id) gid: vec3<u32>,
@builtin(num_workgroups) wgs: vec3<u32>)
- Processes one Gaussian per invocation. Early exits if gid.x >= uModel.num_points.
- Writes to the shared splat buffer (points_2d), sorter buffers, and indirect counters.
Bindings
| Group.Binding | Resource |
|---|---|
| 0.0 | CameraUniforms (view/proj matrices + viewport/focal) |
| 1.0 | gaussians_packed : array |
| 1.1 | color_buffer : array |
| 1.2 | points_2d : array |
| 2.0 | SortInfos (atomic counters) |
| 2.1 | sort_depths : array |
| 2.2 | sort_indices : array |
| 2.3 | DispatchIndirect |
| 3.0 | RenderSettings |
| 3.1 | ModelParams (per-model transform + precision metadata) |
Key structs
- CameraUniforms �?view/proj matrices, inverse matrices, viewport, focal lengths.
- Splat �?packed eigenvectors, NDC position, high-precision z, packed RGBA.
- ModelParams �?transform matrix, aseOffset, um_points, gaussianScaling, maxShDeg, kernelSize, opacityScale, cutoffScale, endermode, plus quantization fields (gaussDataType, colorDataType, scales/zero points).
Helper functions
- ead_gaussian_pos_opacity(idx) / ead_gaussian_cov(idx) �?branch on storage precision (FP32/FP16/INT8/UINT8).
- ead_color_channel / sh_coef �?read raw RGB or SH coefficients depending on USE_RAW_COLOR and layout overrides.
- evaluate_sh(dir, idx, sh_deg) �?real SH evaluation up to degree 3 (16 coefficients per color channel).
- pplyDistanceScaling, pplyPanning, pplyRotation live in orbit math, not here.
Specialization overrides
| Name | Default | Effect |
|---|---|---|
| MAX_SH_DEG | (injected) | Maximum SH degree to evaluate per splat (0�?). |
| USE_RAW_COLOR | alse | Treat color buffer as RGB instead of SH coefficients. |
| SH_LAYOUT_CHANNEL_MAJOR | alse | Switch between interleaved and channel-major SH storage. |
| DISCARD_BY_WORLD_TRACE | alse | Enable world-space covariance trace culling. |
| MAX_WORLD_TRACE | .25 | Threshold when the above is enabled. |
2. Radix sort (
adix_sort.wgsl)
Entry points
wgsl
@compute @workgroup_size(histogram_wg_size) fn zero_histograms(...)
@compute @workgroup_size(histogram_wg_size) fn calculate_histogram(...)
@compute @workgroup_size(prefix_wg_size) fn prefix_histogram(...)
@compute @workgroup_size(scatter_wg_size) fn scatter_even(...)
@compute @workgroup_size(scatter_wg_size) fn scatter_odd(...)
Workgroup sizes and radix parameters are prepended at compile time (see GPURSSorter.processShaderTemplate).
Bindings (Group 0)
0.0 �?SortInfos / GeneralInfo (atomic counters)
0.1 �?Histogram buffer (atomic<u32>)
0.2 �?Key buffer A (depth keys)
0.3 �?Key buffer B (ping-pong)
0.4 �?Payload buffer A (splat indices)
0.5 �?Payload buffer B (ping-pong)
Constants injected
const histogram_wg_size : u32 = ...;
const histogram_sg_size : u32 = ...;
const prefix_wg_size : u32 = ...;
const scatter_wg_size : u32 = ...;
const rs_radix_log2 : u32 = 8u; // 256 buckets
const rs_keyval_size : u32 = 4u; // 4 passes for 32-bit keys
Workflow
- zero_histograms clears histograms and resets pass metadata.
- calculate_histogram populates histograms for all passes concurrently using shared memory.
- prefix_histogram produces exclusive prefix sums per digit.
- scatter_even/scatter_odd move keys/payloads into their sorted positions using the prefix offsets and update ping-pong buffers.
DispatchIndirect.dispatch_x determines how many workgroups to launch when using indirect dispatch; preprocess increments it as splats are written.
3. Gaussian renderer (gaussian.wgsl)
Vertex shader
wgsl
@vertex
fn vs_main(@builtin(vertex_index) vertex_id: u32,
@builtin(instance_index) instance_id: u32) -> VertexOutput
- Fetches Splat by points_2d[indices[instance_id]].
- Generates four vertices per instance (screen-aligned quad scaled by eigenvectors × CUTOFF).
- Outputs clip-space position, local screen coordinates, and color.
Fragment shader
wgsl
@fragment
fn fs_main(in: VertexOutput) -> @location(0) vec4<f32>
- Computes
² = dot(screen_pos, screen_pos).
- Discards fragments outside the cutoff circle.
- Evaluates Gaussian falloff exp(-r²) and multiplies by the stored alpha (capped at 0.99) to avoid fully opaque clamping.
- Returns premultiplied color for correct alpha blending.
Bindings
| Group.Binding | Resource |
|---|---|
| 0.2 | points_2d (read-only splats) |
| 1.4 | indices (sorted payloads written by radix sort) |
Constant
CUTOFF = sqrt(log(255)) �?2.3539 �?ensures fragments are discarded once the Gaussian falls below ~1/255 opacity.
4. Utility kernels
- compress_gaussians.wgsl �?compute shader that reads FP32 splats and writes quantized versions (used offline or for testing). Shares Gaussian, ModelParams, and RenderSettings structs.
- convert_precision.wgsl �?similar conversion kernel used in the ONNX precision pipeline (takes existing GPU buffers, writes new ones).
- debug-helpers.wgsl �?small compute functions exposed via developer tooling to copy/inspect GPU buffers.
5. TypeScript integration (src/shaders/index.ts)
s
export { default as preprocessShader } from './preprocess.wgsl?raw';
export { default as gaussianShader } from './gaussian.wgsl?raw';
export { default as radixSortShader } from './radix_sort.wgsl?raw';
The renderer and preprocess pipeline replace placeholder strings (e.g.,
6. Usage checklist
- Update ModelParams if you add new per-model uniform fields (preprocess reads them directly).
- When changing SH layouts or precision modes, keep ead_color_channel / sh_coef_* helpers in sync with the loader.
- Any new shader must follow the existing binding order so renderer/preprocess do not need extra bind groups.
- For indirect workloads, ensure preprocess writes both SortInfos.keys_size and DispatchIndirect.dispatch_x; radix sort and renderer consume them without CPU intervention.