Preprocessing Module

The preprocessing module converts GPU-resident 3D Gaussian splats into screen-space splats that the sorter and renderer can consume. It encapsulates the WebGPU compute pipeline (preprocess.wgsl), packs camera + render uniforms, evaluates spherical harmonics (or raw RGB) per Gaussian, and writes results into the global 2D splat buffer shared across all models in a frame.

Responsibilities

Camera-aware projection – Packages view/projection matrices (with the WebGPU Y‑flip), computes focal lengths, and streams them to the shader every frame.
Point-cloud binding – Reuses the buffers owned by PointCloud (gaussians, sh, draw uniforms, model params) and wires them into the compute bind groups.
3D→2D transformation – Projects positions, clamps by clipping boxes, converts 3×3 covariance to screen-space ellipses, and optionally performs mip-splatting opacity modulation.
Color evaluation – Executes SH evaluation up to degree 3 or, when useRawColor=true, treats the SH buffer as direct RGBA to match DynamicPointCloud outputs.
Sorter handshake – Emits packed splats into a global splat2D buffer, fills depth keys/payloads, and updates atomic counters (keys_size, indirect dispatch_x) required by GPURSSorter.
Multi-model dispatch – Exposes dispatchModel(...) so the renderer can stream multiple point clouds into contiguous slices of the same output buffers before running a single global radix sort.

Core Components

GaussianPreprocessor (src/preprocess/gaussian_preprocessor.ts) – Concrete implementation that owns the compute pipeline, uniform buffers, and dispatchModel.
preprocess.wgsl (src/shaders/preprocess.wgsl) – Shader entry point executed with 256 threads per workgroup; each thread expands one Gaussian.
preprocess/index.ts – Defines PreprocessArgs, IPreprocessor, and re‑exports the concrete class for renderer and tools.
SortedSplats / GPURSSorter – Sorting resources that expose sorter_bg_pre, sorter_uni, and sorter_dis; preprocessing updates these so the radix sorter can run either directly or via indirect dispatch.

Data Flow

[PointCloud GPU buffers] ─┐
                          ├─► GaussianPreprocessor.dispatchModel
[Camera + Render settings]┘            │
                                       ▼
                            Global 2D splat buffer + sort buffers
                                       │
                                       ▼
                           GPURSSorter (radix) ─► Renderer

The renderer typically instantiates two preprocessors (SH + raw RGB), calls dispatchModel once per point cloud, then runs a single radix sort followed by an indirect draw.

Uniform & Buffer Snapshot

Resource	Size	Producer	Notes
Camera uniforms	272 B	`GaussianPreprocessor.packCameraUniforms`	View, view⁻¹, proj (Y‑flipped), proj⁻¹, viewport, focal.
Render settings	80 B	`packSettingsUniforms`	Clipping boxes, Gaussian scaling, max SH degree, env/mip flags, kernel size, walltime, scene extent, center.
Model params	128 B	`PointCloud.updateModelParamsWithOffset`	Model matrix, base offset, numPoints, shading knobs, precision metadata; flushed per dispatch.
Global `splat2D`	`numPoints × 32` B	Renderer	Shared output buffer; `dispatchModel` writes into `[baseOffset, baseOffset + count)` slice.
Sort buffers (`sorter_bg_pre`)	variable	`GPURSSorter`	Holds key/value ping‑pong buffers, indirect dispatch, and atomic counters that preprocessing updates through atomics.

Compute Stage (per Gaussian)

Load & unpack packed f16 position, opacity, covariance, and SH/RAW coefficients.
Transform position into camera space; apply clipping‑box and frustum checks up-front to drop invisible splats early.
Project covariance by multiplying the 3×3 symmetric matrix with the analytical Jacobian of the perspective projection (Σ₂D = Jᵀ · W · Σ₃D · Wᵀ · J).
Eigen decomposition of the 2×2 covariance to derive ellipse axes; add kernelSize for anti-aliasing and optional mip-splatting opacity tweaks.
Evaluate color via SH polynomials (degree limited by settings.maxSHDegree) or, if USE_RAW_COLOR=1, reinterpret the SH buffer as packed RGBA.
Write output into the global splat2D buffer (packed f16 axes, position, color) and emit matching depth keys / payload indices for the sorter.
Update counters using atomics (keys_size, dispatch_x) so the radix sorter can either run a fixed dispatch or consume the indirect counts produced during preprocessing.

Multi-model Dispatch

GaussianPreprocessor.dispatchModel accepts:

camera, viewport – PerspectiveCamera plus [width, height].
pointCloud – Provides Gaussian/SH buffers, draw uniforms, and per-model uniform buffer.
sortStuff – Global PointCloudSortStuff containing sorter_bg_pre, sorter_uni, and sorter_dis.
settings – Renderer-generated struct (scales, SH degree, clipping boxes, etc.).
modelMatrix & baseOffset – Passed to PointCloud.updateModelParamsWithOffset so each model writes into a unique slice of the global splat buffer.
global.splat2D – Output storage shared by every model in the frame.
countBuffer? – Optional ONNX-produced GPU buffer; its value overwrites modelParams.num_points (byte offset 68) after uniforms are flushed, enabling indirect draws to track dynamic point counts.

The method also calls pointCloud.setPrecisionForShader() when available so quantized buffers publish their scales/zero-points into the uniform block before dispatch.

Usage Example

// Initialization (renderer boot)
const preprocessorSH = new GaussianPreprocessor();
await preprocessorSH.initialize(device, /*shDegree*/ 3, /*useRawColor*/ false);

const preprocessorRGB = new GaussianPreprocessor();
await preprocessorRGB.initialize(device, /*shDegree*/ 0, /*useRawColor*/ true);

// Per frame: preprocess every point cloud into the global buffers
const encoder = device.createCommandEncoder();
let baseOffset = 0;
for (const pc of pointClouds) {
  const pre = pc.colorMode === 'rgb' ? preprocessorRGB : preprocessorSH;
  const countBuffer = 'countBuffer' in pc ? pc.countBuffer?.() : undefined;
  pre.dispatchModel({
    camera,
    viewport: [width, height],
    pointCloud: pc,
    sortStuff: globalSortStuff,
    settings: buildRenderSettings(pc, frameSettings),
    modelMatrix: pc.transform,
    baseOffset,
    global: { splat2D: globalSplatBuffer },
    countBuffer,
  }, encoder);
  baseOffset += pc.numPoints;
}
// Later in the frame: run global radix sort + renderer draw

Performance & Tuning

Workgroup size 256 keeps occupancy high across desktop and mobile GPUs.
Early exits (frustum + clipping box) avoid unnecessary matrix math for culled splats.
Packed f16 buffers halve bandwidth for Gaussian attributes and SH data.
Atomic contention grows with extremely high visibility rates; consider aggressive clipping boxes or smaller gaussianScaling on dense scenes.
Dual pipelines (SH vs raw RGB) avoid runtime branches; USE_RAW_COLOR is compiled into the shader via pipeline constants.

Integration Points

Point Cloud Module – Supplies GPU buffers and model-param uniforms; exposes setPrecisionForShader for quantized ONNX outputs.
Sorting Module – Provides PointCloudSortStuff with the preprocess bind group; receives updated counters and depth keys from the compute pass.
Renderer Module – Owns the global splat2D buffer, orchestrates multi-model dispatch, and triggers a single global radix sort followed by an indirect draw.
Camera System – PerspectiveCamera supplies matrices and focal data; the module handles inversion and Y‑flip packing internally.

Architecture – Bind-group layouts, uniform packing strategies, and shader pipelines.
API & Inference Reference – Entry points, dispatch arguments, and buffer schemas.
Sorting Module – Shows how preprocess output feeds global radix sort.
Renderer Module – Explains how the splat2D buffer becomes draw calls.
Camera Module – Details the matrix/focal data consumed during projection.