Skip to content

Preprocessing Module

The preprocessing module converts GPU-resident 3D Gaussian splats into screen-space splats that the sorter and renderer can consume. It encapsulates the WebGPU compute pipeline (preprocess.wgsl), packs camera + render uniforms, evaluates spherical harmonics (or raw RGB) per Gaussian, and writes results into the global 2D splat buffer shared across all models in a frame.

Responsibilities

  • Camera-aware projection – Packages view/projection matrices (with the WebGPU Y‑flip), computes focal lengths, and streams them to the shader every frame.
  • Point-cloud binding – Reuses the buffers owned by PointCloud (gaussians, sh, draw uniforms, model params) and wires them into the compute bind groups.
  • 3D→2D transformation – Projects positions, clamps by clipping boxes, converts 3×3 covariance to screen-space ellipses, and optionally performs mip-splatting opacity modulation.
  • Color evaluation – Executes SH evaluation up to degree 3 or, when useRawColor=true, treats the SH buffer as direct RGBA to match DynamicPointCloud outputs.
  • Sorter handshake – Emits packed splats into a global splat2D buffer, fills depth keys/payloads, and updates atomic counters (keys_size, indirect dispatch_x) required by GPURSSorter.
  • Multi-model dispatch – Exposes dispatchModel(...) so the renderer can stream multiple point clouds into contiguous slices of the same output buffers before running a single global radix sort.

Core Components

  • GaussianPreprocessor (src/preprocess/gaussian_preprocessor.ts) – Concrete implementation that owns the compute pipeline, uniform buffers, and dispatchModel.
  • preprocess.wgsl (src/shaders/preprocess.wgsl) – Shader entry point executed with 256 threads per workgroup; each thread expands one Gaussian.
  • preprocess/index.ts – Defines PreprocessArgs, IPreprocessor, and re‑exports the concrete class for renderer and tools.
  • SortedSplats / GPURSSorter – Sorting resources that expose sorter_bg_pre, sorter_uni, and sorter_dis; preprocessing updates these so the radix sorter can run either directly or via indirect dispatch.

Data Flow

[PointCloud GPU buffers] ─┐
                          ├─► GaussianPreprocessor.dispatchModel
[Camera + Render settings]┘            │
                            Global 2D splat buffer + sort buffers
                           GPURSSorter (radix) ─► Renderer

The renderer typically instantiates two preprocessors (SH + raw RGB), calls dispatchModel once per point cloud, then runs a single radix sort followed by an indirect draw.

Uniform & Buffer Snapshot

Resource Size Producer Notes
Camera uniforms 272 B GaussianPreprocessor.packCameraUniforms View, view⁻¹, proj (Y‑flipped), proj⁻¹, viewport, focal.
Render settings 80 B packSettingsUniforms Clipping boxes, Gaussian scaling, max SH degree, env/mip flags, kernel size, walltime, scene extent, center.
Model params 128 B PointCloud.updateModelParamsWithOffset Model matrix, base offset, numPoints, shading knobs, precision metadata; flushed per dispatch.
Global splat2D numPoints × 32 B Renderer Shared output buffer; dispatchModel writes into [baseOffset, baseOffset + count) slice.
Sort buffers (sorter_bg_pre) variable GPURSSorter Holds key/value ping‑pong buffers, indirect dispatch, and atomic counters that preprocessing updates through atomics.

Compute Stage (per Gaussian)

  1. Load & unpack packed f16 position, opacity, covariance, and SH/RAW coefficients.
  2. Transform position into camera space; apply clipping‑box and frustum checks up-front to drop invisible splats early.
  3. Project covariance by multiplying the 3×3 symmetric matrix with the analytical Jacobian of the perspective projection (Σ₂D = Jᵀ · W · Σ₃D · Wᵀ · J).
  4. Eigen decomposition of the 2×2 covariance to derive ellipse axes; add kernelSize for anti-aliasing and optional mip-splatting opacity tweaks.
  5. Evaluate color via SH polynomials (degree limited by settings.maxSHDegree) or, if USE_RAW_COLOR=1, reinterpret the SH buffer as packed RGBA.
  6. Write output into the global splat2D buffer (packed f16 axes, position, color) and emit matching depth keys / payload indices for the sorter.
  7. Update counters using atomics (keys_size, dispatch_x) so the radix sorter can either run a fixed dispatch or consume the indirect counts produced during preprocessing.

Multi-model Dispatch

GaussianPreprocessor.dispatchModel accepts:

  • camera, viewportPerspectiveCamera plus [width, height].
  • pointCloud – Provides Gaussian/SH buffers, draw uniforms, and per-model uniform buffer.
  • sortStuff – Global PointCloudSortStuff containing sorter_bg_pre, sorter_uni, and sorter_dis.
  • settings – Renderer-generated struct (scales, SH degree, clipping boxes, etc.).
  • modelMatrix & baseOffset – Passed to PointCloud.updateModelParamsWithOffset so each model writes into a unique slice of the global splat buffer.
  • global.splat2D – Output storage shared by every model in the frame.
  • countBuffer? – Optional ONNX-produced GPU buffer; its value overwrites modelParams.num_points (byte offset 68) after uniforms are flushed, enabling indirect draws to track dynamic point counts.

The method also calls pointCloud.setPrecisionForShader() when available so quantized buffers publish their scales/zero-points into the uniform block before dispatch.

Usage Example

// Initialization (renderer boot)
const preprocessorSH = new GaussianPreprocessor();
await preprocessorSH.initialize(device, /*shDegree*/ 3, /*useRawColor*/ false);

const preprocessorRGB = new GaussianPreprocessor();
await preprocessorRGB.initialize(device, /*shDegree*/ 0, /*useRawColor*/ true);

// Per frame: preprocess every point cloud into the global buffers
const encoder = device.createCommandEncoder();
let baseOffset = 0;
for (const pc of pointClouds) {
  const pre = pc.colorMode === 'rgb' ? preprocessorRGB : preprocessorSH;
  const countBuffer = 'countBuffer' in pc ? pc.countBuffer?.() : undefined;
  pre.dispatchModel({
    camera,
    viewport: [width, height],
    pointCloud: pc,
    sortStuff: globalSortStuff,
    settings: buildRenderSettings(pc, frameSettings),
    modelMatrix: pc.transform,
    baseOffset,
    global: { splat2D: globalSplatBuffer },
    countBuffer,
  }, encoder);
  baseOffset += pc.numPoints;
}
// Later in the frame: run global radix sort + renderer draw

Performance & Tuning

  • Workgroup size 256 keeps occupancy high across desktop and mobile GPUs.
  • Early exits (frustum + clipping box) avoid unnecessary matrix math for culled splats.
  • Packed f16 buffers halve bandwidth for Gaussian attributes and SH data.
  • Atomic contention grows with extremely high visibility rates; consider aggressive clipping boxes or smaller gaussianScaling on dense scenes.
  • Dual pipelines (SH vs raw RGB) avoid runtime branches; USE_RAW_COLOR is compiled into the shader via pipeline constants.

Integration Points

  • Point Cloud Module – Supplies GPU buffers and model-param uniforms; exposes setPrecisionForShader for quantized ONNX outputs.
  • Sorting Module – Provides PointCloudSortStuff with the preprocess bind group; receives updated counters and depth keys from the compute pass.
  • Renderer Module – Owns the global splat2D buffer, orchestrates multi-model dispatch, and triggers a single global radix sort followed by an indirect draw.
  • Camera SystemPerspectiveCamera supplies matrices and focal data; the module handles inversion and Y‑flip packing internally.