Preprocessing Module
The preprocessing module converts GPU-resident 3D Gaussian splats into screen-space splats that the sorter and renderer can consume. It encapsulates the WebGPU compute pipeline (preprocess.wgsl), packs camera + render uniforms, evaluates spherical harmonics (or raw RGB) per Gaussian, and writes results into the global 2D splat buffer shared across all models in a frame.
Responsibilities
- Camera-aware projection – Packages view/projection matrices (with the WebGPU Y‑flip), computes focal lengths, and streams them to the shader every frame.
- Point-cloud binding – Reuses the buffers owned by
PointCloud(gaussians,sh, draw uniforms, model params) and wires them into the compute bind groups. - 3D→2D transformation – Projects positions, clamps by clipping boxes, converts 3×3 covariance to screen-space ellipses, and optionally performs mip-splatting opacity modulation.
- Color evaluation – Executes SH evaluation up to degree 3 or, when
useRawColor=true, treats the SH buffer as direct RGBA to matchDynamicPointCloudoutputs. - Sorter handshake – Emits packed splats into a global
splat2Dbuffer, fills depth keys/payloads, and updates atomic counters (keys_size, indirectdispatch_x) required byGPURSSorter. - Multi-model dispatch – Exposes
dispatchModel(...)so the renderer can stream multiple point clouds into contiguous slices of the same output buffers before running a single global radix sort.
Core Components
GaussianPreprocessor(src/preprocess/gaussian_preprocessor.ts) – Concrete implementation that owns the compute pipeline, uniform buffers, anddispatchModel.preprocess.wgsl(src/shaders/preprocess.wgsl) – Shader entry point executed with 256 threads per workgroup; each thread expands one Gaussian.preprocess/index.ts– DefinesPreprocessArgs,IPreprocessor, and re‑exports the concrete class for renderer and tools.SortedSplats/GPURSSorter– Sorting resources that exposesorter_bg_pre,sorter_uni, andsorter_dis; preprocessing updates these so the radix sorter can run either directly or via indirect dispatch.
Data Flow
[PointCloud GPU buffers] ─┐
├─► GaussianPreprocessor.dispatchModel
[Camera + Render settings]┘ │
▼
Global 2D splat buffer + sort buffers
│
▼
GPURSSorter (radix) ─► Renderer
The renderer typically instantiates two preprocessors (SH + raw RGB), calls dispatchModel once per point cloud, then runs a single radix sort followed by an indirect draw.
Uniform & Buffer Snapshot
| Resource | Size | Producer | Notes |
|---|---|---|---|
| Camera uniforms | 272 B | GaussianPreprocessor.packCameraUniforms |
View, view⁻¹, proj (Y‑flipped), proj⁻¹, viewport, focal. |
| Render settings | 80 B | packSettingsUniforms |
Clipping boxes, Gaussian scaling, max SH degree, env/mip flags, kernel size, walltime, scene extent, center. |
| Model params | 128 B | PointCloud.updateModelParamsWithOffset |
Model matrix, base offset, numPoints, shading knobs, precision metadata; flushed per dispatch. |
Global splat2D |
numPoints × 32 B |
Renderer | Shared output buffer; dispatchModel writes into [baseOffset, baseOffset + count) slice. |
Sort buffers (sorter_bg_pre) |
variable | GPURSSorter |
Holds key/value ping‑pong buffers, indirect dispatch, and atomic counters that preprocessing updates through atomics. |
Compute Stage (per Gaussian)
- Load & unpack packed f16 position, opacity, covariance, and SH/RAW coefficients.
- Transform position into camera space; apply clipping‑box and frustum checks up-front to drop invisible splats early.
- Project covariance by multiplying the 3×3 symmetric matrix with the analytical Jacobian of the perspective projection (
Σ₂D = Jᵀ · W · Σ₃D · Wᵀ · J). - Eigen decomposition of the 2×2 covariance to derive ellipse axes; add
kernelSizefor anti-aliasing and optional mip-splatting opacity tweaks. - Evaluate color via SH polynomials (degree limited by
settings.maxSHDegree) or, ifUSE_RAW_COLOR=1, reinterpret the SH buffer as packed RGBA. - Write output into the global
splat2Dbuffer (packed f16 axes, position, color) and emit matching depth keys / payload indices for the sorter. - Update counters using atomics (
keys_size,dispatch_x) so the radix sorter can either run a fixed dispatch or consume the indirect counts produced during preprocessing.
Multi-model Dispatch
GaussianPreprocessor.dispatchModel accepts:
camera,viewport–PerspectiveCameraplus[width, height].pointCloud– Provides Gaussian/SH buffers, draw uniforms, and per-model uniform buffer.sortStuff– GlobalPointCloudSortStuffcontainingsorter_bg_pre,sorter_uni, andsorter_dis.settings– Renderer-generated struct (scales, SH degree, clipping boxes, etc.).modelMatrix&baseOffset– Passed toPointCloud.updateModelParamsWithOffsetso each model writes into a unique slice of the global splat buffer.global.splat2D– Output storage shared by every model in the frame.countBuffer?– Optional ONNX-produced GPU buffer; its value overwritesmodelParams.num_points(byte offset 68) after uniforms are flushed, enabling indirect draws to track dynamic point counts.
The method also calls pointCloud.setPrecisionForShader() when available so quantized buffers publish their scales/zero-points into the uniform block before dispatch.
Usage Example
// Initialization (renderer boot)
const preprocessorSH = new GaussianPreprocessor();
await preprocessorSH.initialize(device, /*shDegree*/ 3, /*useRawColor*/ false);
const preprocessorRGB = new GaussianPreprocessor();
await preprocessorRGB.initialize(device, /*shDegree*/ 0, /*useRawColor*/ true);
// Per frame: preprocess every point cloud into the global buffers
const encoder = device.createCommandEncoder();
let baseOffset = 0;
for (const pc of pointClouds) {
const pre = pc.colorMode === 'rgb' ? preprocessorRGB : preprocessorSH;
const countBuffer = 'countBuffer' in pc ? pc.countBuffer?.() : undefined;
pre.dispatchModel({
camera,
viewport: [width, height],
pointCloud: pc,
sortStuff: globalSortStuff,
settings: buildRenderSettings(pc, frameSettings),
modelMatrix: pc.transform,
baseOffset,
global: { splat2D: globalSplatBuffer },
countBuffer,
}, encoder);
baseOffset += pc.numPoints;
}
// Later in the frame: run global radix sort + renderer draw
Performance & Tuning
- Workgroup size 256 keeps occupancy high across desktop and mobile GPUs.
- Early exits (frustum + clipping box) avoid unnecessary matrix math for culled splats.
- Packed f16 buffers halve bandwidth for Gaussian attributes and SH data.
- Atomic contention grows with extremely high visibility rates; consider aggressive clipping boxes or smaller
gaussianScalingon dense scenes. - Dual pipelines (SH vs raw RGB) avoid runtime branches;
USE_RAW_COLORis compiled into the shader via pipeline constants.
Integration Points
- Point Cloud Module – Supplies GPU buffers and model-param uniforms; exposes
setPrecisionForShaderfor quantized ONNX outputs. - Sorting Module – Provides
PointCloudSortStuffwith the preprocess bind group; receives updated counters and depth keys from the compute pass. - Renderer Module – Owns the global
splat2Dbuffer, orchestrates multi-model dispatch, and triggers a single global radix sort followed by an indirect draw. - Camera System –
PerspectiveCamerasupplies matrices and focal data; the module handles inversion and Y‑flip packing internally.
Related Docs
- Architecture – Bind-group layouts, uniform packing strategies, and shader pipelines.
- API & Inference Reference – Entry points, dispatch arguments, and buffer schemas.
- Sorting Module – Shows how preprocess output feeds global radix sort.
- Renderer Module – Explains how the
splat2Dbuffer becomes draw calls. - Camera Module – Details the matrix/focal data consumed during projection.