Skip to content

Shaders Module API Reference

This reference focuses on the WGSL entry points and data layouts used by Visionary. Each section lists the relevant structs, bindings, and specialization options so you can modify or extend the shaders safely.

1. Preprocess (preprocess.wgsl)

Entry point

wgsl @compute @workgroup_size(256, 1, 1) fn preprocess(@builtin(global_invocation_id) gid: vec3<u32>, @builtin(num_workgroups) wgs: vec3<u32>) - Processes one Gaussian per invocation. Early exits if gid.x >= uModel.num_points. - Writes to the shared splat buffer (points_2d), sorter buffers, and indirect counters.

Bindings

Group.Binding Resource
0.0 CameraUniforms (view/proj matrices + viewport/focal)
1.0 gaussians_packed : array (raw Gaussian data, format selected by uModel.gaussDataType)
1.1 color_buffer : array (SH coefficients or RGB payload, format selected by uModel.colorDataType)
1.2 points_2d : array (read/write splat buffer)
2.0 SortInfos (atomic counters)
2.1 sort_depths : array
2.2 sort_indices : array
2.3 DispatchIndirect
3.0 RenderSettings
3.1 ModelParams (per-model transform + precision metadata)

Key structs

  • CameraUniforms �?view/proj matrices, inverse matrices, viewport, focal lengths.
  • Splat �?packed eigenvectors, NDC position, high-precision z, packed RGBA.
  • ModelParams �?transform matrix, aseOffset, um_points, gaussianScaling, maxShDeg, kernelSize, opacityScale, cutoffScale, endermode, plus quantization fields (gaussDataType, colorDataType, scales/zero points).

Helper functions

  • ead_gaussian_pos_opacity(idx) / ead_gaussian_cov(idx) �?branch on storage precision (FP32/FP16/INT8/UINT8).
  • ead_color_channel / sh_coef �?read raw RGB or SH coefficients depending on USE_RAW_COLOR and layout overrides.
  • evaluate_sh(dir, idx, sh_deg) �?real SH evaluation up to degree 3 (16 coefficients per color channel).
  • pplyDistanceScaling, pplyPanning, pplyRotation live in orbit math, not here.

Specialization overrides

Name Default Effect
MAX_SH_DEG (injected) Maximum SH degree to evaluate per splat (0�?).
USE_RAW_COLOR alse Treat color buffer as RGB instead of SH coefficients.
SH_LAYOUT_CHANNEL_MAJOR alse Switch between interleaved and channel-major SH storage.
DISCARD_BY_WORLD_TRACE alse Enable world-space covariance trace culling.
MAX_WORLD_TRACE .25 Threshold when the above is enabled.

2. Radix sort (

adix_sort.wgsl)

Entry points

wgsl @compute @workgroup_size(histogram_wg_size) fn zero_histograms(...) @compute @workgroup_size(histogram_wg_size) fn calculate_histogram(...) @compute @workgroup_size(prefix_wg_size) fn prefix_histogram(...) @compute @workgroup_size(scatter_wg_size) fn scatter_even(...) @compute @workgroup_size(scatter_wg_size) fn scatter_odd(...) Workgroup sizes and radix parameters are prepended at compile time (see GPURSSorter.processShaderTemplate).

Bindings (Group 0)

0.0 �?SortInfos / GeneralInfo (atomic counters) 0.1 �?Histogram buffer (atomic<u32>) 0.2 �?Key buffer A (depth keys) 0.3 �?Key buffer B (ping-pong) 0.4 �?Payload buffer A (splat indices) 0.5 �?Payload buffer B (ping-pong)

Constants injected

const histogram_wg_size : u32 = ...; const histogram_sg_size : u32 = ...; const prefix_wg_size : u32 = ...; const scatter_wg_size : u32 = ...; const rs_radix_log2 : u32 = 8u; // 256 buckets const rs_keyval_size : u32 = 4u; // 4 passes for 32-bit keys

Workflow

  1. zero_histograms clears histograms and resets pass metadata.
  2. calculate_histogram populates histograms for all passes concurrently using shared memory.
  3. prefix_histogram produces exclusive prefix sums per digit.
  4. scatter_even/scatter_odd move keys/payloads into their sorted positions using the prefix offsets and update ping-pong buffers.

DispatchIndirect.dispatch_x determines how many workgroups to launch when using indirect dispatch; preprocess increments it as splats are written.

3. Gaussian renderer (gaussian.wgsl)

Vertex shader

wgsl @vertex fn vs_main(@builtin(vertex_index) vertex_id: u32, @builtin(instance_index) instance_id: u32) -> VertexOutput - Fetches Splat by points_2d[indices[instance_id]]. - Generates four vertices per instance (screen-aligned quad scaled by eigenvectors × CUTOFF). - Outputs clip-space position, local screen coordinates, and color.

Fragment shader

wgsl @fragment fn fs_main(in: VertexOutput) -> @location(0) vec4<f32> - Computes ² = dot(screen_pos, screen_pos). - Discards fragments outside the cutoff circle. - Evaluates Gaussian falloff exp(-r²) and multiplies by the stored alpha (capped at 0.99) to avoid fully opaque clamping. - Returns premultiplied color for correct alpha blending.

Bindings

Group.Binding Resource
0.2 points_2d (read-only splats)
1.4 indices (sorted payloads written by radix sort)

Constant

CUTOFF = sqrt(log(255)) �?2.3539 �?ensures fragments are discarded once the Gaussian falls below ~1/255 opacity.

4. Utility kernels

  • compress_gaussians.wgsl �?compute shader that reads FP32 splats and writes quantized versions (used offline or for testing). Shares Gaussian, ModelParams, and RenderSettings structs.
  • convert_precision.wgsl �?similar conversion kernel used in the ONNX precision pipeline (takes existing GPU buffers, writes new ones).
  • debug-helpers.wgsl �?small compute functions exposed via developer tooling to copy/inspect GPU buffers.

5. TypeScript integration (src/shaders/index.ts)

s export { default as preprocessShader } from './preprocess.wgsl?raw'; export { default as gaussianShader } from './gaussian.wgsl?raw'; export { default as radixSortShader } from './radix_sort.wgsl?raw'; The renderer and preprocess pipeline replace placeholder strings (e.g., for MAX_SH_DEG) before compiling WebGPU pipelines.

6. Usage checklist

  • Update ModelParams if you add new per-model uniform fields (preprocess reads them directly).
  • When changing SH layouts or precision modes, keep ead_color_channel / sh_coef_* helpers in sync with the loader.
  • Any new shader must follow the existing binding order so renderer/preprocess do not need extra bind groups.
  • For indirect workloads, ensure preprocess writes both SortInfos.keys_size and DispatchIndirect.dispatch_x; radix sort and renderer consume them without CPU intervention.