Renderer Module Architecture

The renderer module orchestrates the complete rendering pipeline for Gaussian splatting, coordinating preprocessing, GPU radix sorting, and indirect draw execution. GaussianRenderer serves as the high-level coordinator that manages static GPU resources, dual preprocessors (SH and RGB), a shared sorter instance, and global buffers for multi-model batching.

High-level Pipeline

┌────────────────────────────────────────────────────────────┐
│ Application Layer                                          │
│  RenderLoop / App  →  prepareMulti() / renderMulti()     │
└───────────────▲────────────────────────────────────────────┘
                │
┌───────────────┴────────────────────────────────────────────┐
│ Renderer Layer                                             │
│  GaussianRenderer                                          │
│  • Static pipelines (render, depth)                        │
│  • Dual preprocessors (SH, RGB)                           │
│  • Single GPURSSorter instance                            │
│  • Global buffers (splat2D, sorter payload)              │
└───────────────┬────────────────────────────────────────────┘
                │
┌───────────────┴────────────────────────────────────────────┐
│ Preparation Phase (prepareMulti)                           │
│  1. ensureGlobalCapacity(total points)                     │
│  2. reset sorter indirect buffers                          │
│  3. For each PointCloud:                                   │
│     • Select preprocessor (SH vs RGB)                     │
│     • dispatchModel(baseOffset, countBuffer?)              │
│  4. recordSortIndirect(global sort)                        │
│  5. Copy keys_size → drawIndirect.instanceCount           │
└───────────────┬────────────────────────────────────────────┘
                │
┌───────────────┴────────────────────────────────────────────┐
│ Rendering Phase (renderMulti)                             │
│  1. Bind global renderBG (@group(0))                       │
│  2. Bind global sorter_render_bg (@group(1))               │
│  3. Set pipeline (depth-aware if enabled)                  │
│  4. drawIndirect(once for all models)                      │
└────────────────────────────────────────────────────────────┘

Key Design Principles

Static Resource Caching: Pipelines, layouts, and indirect buffers are created once during initialize() and reused across frames
Dual Preprocessor Support: Separate GaussianPreprocessor instances for SH (spherical harmonics) and RGB models enable efficient color mode switching
Global Sorting: Single shared sorter instance with global buffers allows multi-model batching into one indirect draw
Per-Cloud Caching: WeakMap<PointCloud, PointCloudSortStuff> caches sort resources per point cloud, rebuilding only when point count changes
Capacity Management: Global buffers grow with a 1.25× factor to amortize reallocations when total point count increases
Depth Pipeline Variant: Optional depth-enabled pipeline created on-demand when setDepthFormat() is called

Resource Lifecycle

Initialization (`initialize()`)

Sorter Creation: GPURSSorter.create(device, queue) - async initialization with adapter testing
Preprocessor Setup: Two instances created:
preprocessorSH: Initialized with shDegree (default 3)
preprocessorRGB: Initialized with degree 0 for direct RGB models
Pipeline Layout: Combines PointCloud.renderBindGroupLayout and GPURSSorter.createRenderBindGroupLayout
Render Pipelines: Two variants created:
Standard pipeline: No depth testing
Depth pipeline: Created with initial depth24plus format (recreated on format change)
Indirect Draw Buffer: 16-byte buffer initialized with {vertexCount: 4, instanceCount: 0, firstVertex: 0, firstInstance: 0}
Global Buffers: Initial 1M-splat capacity allocated (splat2D buffer + PointCloudSortStuff)

Frame Preparation (`prepareMulti`)

Capacity Check: Sum all pointCloud.numPoints, grow global buffers if needed (1.25× growth factor)
Sorter Reset: recordResetIndirectBuffer clears keys_size and dispatch_x counters
Per-Model Dispatch:
Calculate baseOffset for each model (cumulative sum of previous models' point counts)
Detect color mode: pointCloud.colorMode → select preprocessorSH or preprocessorRGB
Build render settings: Merge RenderArgs with per-cloud metadata (bbox, center, kernelSize, etc.)
Call preprocessor.dispatchModel() with:
- Global splat2D buffer and sortStuff
- Model-specific baseOffset and transform matrix
- Optional ONNX countBuffer for dynamic models
Global Sort: Single recordSortIndirect call processes all models together
Indirect Draw Update: Copy sorter_uni.keys_size (4 bytes) into drawIndirectBuffer[4:8] to set instance count

Frame Rendering (`renderMulti`)

Bind Groups:
@group(0): Global renderBG (binds global splat2D buffer)
@group(1): Global sorter_render_bg (binds sorter_uni and payload_a)
Pipeline Selection: Use depth pipeline if useDepth && pipelineDepth exists, otherwise standard pipeline
Indirect Draw: Single drawIndirect(drawIndirectBuffer, 0) call renders all visible splats

Bind Group Layouts

Render Pipeline Layout

The renderer uses two bind groups:

@group(0): PointCloud.renderBindGroupLayout(device)
binding 2: Read-only access to projected splat2D buffer (vertex attributes)
@group(1): GPURSSorter.createRenderBindGroupLayout(device)
binding 0: sorter_uni (read-only GeneralInfo struct with keys_size)
binding 4: payload_a (sorted index buffer for indirect draw)

Global vs Per-Cloud Resources

Global Path (multi-model): - Single splat2D buffer sized to total capacity - Single PointCloudSortStuff with global sort buffers - Single renderBG bind group pointing to global splat2D - One indirect draw for all models

Per-Cloud Path (legacy single-model): - Each PointCloud has its own splat2DBuffer (managed by PointCloud) - Cached PointCloudSortStuff per point cloud (via WeakMap) - pointCloud.renderBindGroup() binds per-cloud splat2DBuffer - Separate draw per model (though prepareMulti still recommended)

Preprocessor Selection

The renderer automatically selects the appropriate preprocessor based on PointCloud.colorMode:

'sh' mode: Uses preprocessorSH (initialized with shDegree)
Handles spherical harmonics coefficients (4, 12, 27, or 48 channels)
Evaluates SH basis functions in the fragment shader
'rgb' mode: Uses preprocessorRGB (initialized with degree 0)
Direct RGB color channels (3 or 4 channels)
No SH evaluation needed

The selection happens in getColorMode() which reads pointCloud.colorMode. Both preprocessors write into the same global splat2D buffer format, ensuring compatibility with the shared render pipeline.

Global Buffer Management

Capacity Growth

ensureGlobalCapacity(total) implements dynamic buffer growth:

Calculate needed capacity: Math.max(1, total)
If current globalCapacity >= needed, return early
Grow with 1.25× factor: Math.ceil(needed * 1.25)
Destroy old buffers (if any):
globalBuffers.splat2D.destroy()
sortStuff buffers are owned by sorter (GC'd when unused)
Allocate new resources:
sorter.createSortStuff(device, newCapacity) → new PointCloudSortStuff
device.createBuffer() for splat2D (size = newCapacity * BUFFER_CONFIG.SPLAT_STRIDE)
Create new renderBG bind group with PointCloud.renderBindGroupLayout
Update globalCapacity and globalBuffers reference

Buffer Layout

Global splat2D Buffer: - Size: globalCapacity * BUFFER_CONFIG.SPLAT_STRIDE - Usage: STORAGE | COPY_DST | COPY_SRC - Layout: Per-splat attributes (position, color, covariance, etc.) written by preprocessors

Global Sort Buffers (via PointCloudSortStuff): - key_a, key_b: Ping-pong depth key buffers (padded to workgroup multiples) - payload_a, payload_b: Ping-pong index buffers (final sorted order in payload_a) - sorter_uni: GeneralInfo struct with keys_size (visible splat count) - sorter_dis: Indirect dispatch buffer with workgroup counts

Depth Pipeline

The renderer supports optional depth testing via a separate pipeline variant:

Creation

createDepthPipeline() creates a depth-aware pipeline with: - Same shader module and entry points as standard pipeline - depthStencil configuration: - Format: Configurable (depth24plus default, changeable via setDepthFormat()) - depthWriteEnabled: false (read-only depth test) - depthCompare: 'less' (standard Z-buffer comparison)

Runtime Control

setDepthEnabled(enabled): Toggles useDepth flag
setDepthFormat(format): Updates depthFormat and recreates the depth pipeline

When useDepth && pipelineDepth is true, renderMulti() uses the depth pipeline; otherwise, it uses the standard pipeline. This allows switching between pure back-to-front sorting (no depth) and depth-assisted rendering.

Render Settings Merging

buildRenderSettings() merges RenderArgs with per-cloud metadata:

Setting	Source Priority
`maxSHDegree`	`min(args.maxSHDegree ?? pointCloud.shDeg, renderer.shDegree)`
`showEnvMap`	`args.showEnvMap ?? true`
`mipSplatting`	`args.mipSplatting ?? pointCloud.mipSplatting ?? false`
`kernelSize`	`args.kernelSize ?? pointCloud.kernelSize ?? DEFAULT_KERNEL_SIZE`
`walltime`	`args.walltime ?? 1.0`
`sceneExtend`	`args.sceneExtend ?? computed sceneSize`
`center`	`args.sceneCenter ?? pointCloud.center`
`clippingBoxMin/Max`	`args.clippingBox ?? pointCloud.bbox`

These settings are passed to preprocessor.dispatchModel() and written into the preprocessor's uniform buffer for shader consumption.

Integration Points

Point Cloud Module

Bind Group Layouts: PointCloud.renderBindGroupLayout(device) provides the @group(0) layout
Per-Cloud Resources: pointCloud.renderBindGroup() returns bind group for per-cloud rendering
Transform Matrix: pointCloud.transform (4×4 matrix) passed to preprocessor for model-space projection
Metadata: bbox, center, shDeg, colorMode, kernelSize, mipSplatting used for render settings
ONNX Support: DynamicPointCloud.countBuffer() provides optional draw count for indirect pipelines

Preprocess Module

Dual Preprocessors: Two GaussianPreprocessor instances handle SH and RGB models
Dispatch Interface: dispatchModel() writes splats into global splat2D buffer at specified baseOffset
Counter Updates: Preprocessors atomically increment sorter_uni.keys_size and sorter_dis.dispatch_x
Settings Injection: Render settings (kernel size, SH degree, clipping box, etc.) written to preprocessor uniforms

Sorting Module

Single Sorter: One GPURSSorter instance shared across all models
Layouts: GPURSSorter.createRenderBindGroupLayout(device) provides @group(1) layout
Sort Resources: sorter.createSortStuff(device, capacity) allocates global sort buffers
Indirect Sort: recordSortIndirect() processes all models in one pass using counters from preprocessing
Payload Access: Sorted payload_a buffer provides indices for indirect draw

Shader Module

Gaussian Shader: gaussianShader (from src/shaders/index) implements vertex and fragment stages
Storage Access: Vertex shader reads all attributes from @group(0) storage buffers
Blending: Fragment shader uses premultiplied alpha blending (src: one, dst: one-minus-src-alpha)
Primitive: Triangle strip topology (4 vertices per splat)

Debug & Diagnostics

Statistics

getRenderStats(pointCloud) returns: - gaussianCount: Total points in the point cloud - visibleSplats: Latest keys_size from sorter (cached num_points if available) - memoryUsage: Coarse estimate (Gaussian + SH buffers + sort buffers)

Debug Helpers

readInstanceCountDebug(): GPU→CPU readback of drawIndirectBuffer[4:8] (instance count)
readPayloadSampleDebug(n): Dumps first n payload indices from global payload_a buffer
debugONNXCount(): Chains into preprocessor debug flow to trace ONNX-driven count buffers

Debug Logging

Enable verbose logging via (globalThis as any).GS_DEBUG_LOGS = true. The renderer logs: - Capacity growth events - Per-model dispatch offsets - Global sort completion - Instance count updates

Performance Considerations

Resource Reuse

Static Resources: Pipelines, layouts, and indirect buffer created once, reused forever
Per-Cloud Caching: Sort resources cached in WeakMap, rebuilt only on point count change
Global Buffers: Grow with 1.25× factor to reduce reallocation frequency

Multi-Model Batching

Single Sort: One radix sort pass handles all models together
Single Draw: One indirect draw call renders all visible splats
Reduced Overhead: Eliminates per-model pipeline switches and draw calls

Memory Footprint

Global buffers scale with total point count: - splat2D: capacity * SPLAT_STRIDE bytes - Sort buffers: capacity * (key_size + payload_size) * 2 (ping-pong) + histogram scratch

For scenes with many small models, global buffers may exceed per-model memory, but this is amortized by the batching benefits.

Common Patterns

Multi-Model Frame

await renderer.initialize();

renderer.prepareMulti(encoder, queue, pointClouds, {
  camera,
  viewport: [width, height],
  maxSHDegree: 3,
});

const pass = encoder.beginRenderPass(passDesc);
renderer.renderMulti(pass, pointClouds);
pass.end();

Depth-Enabled Rendering

renderer.setDepthFormat('depth32float');
renderer.setDepthEnabled(true);
// Subsequent renderMulti() calls use depth pipeline

Per-Model Rendering (Legacy)

Legacy Path refers to the per-model rendering approach used before the introduction of multi-model batching (prepareMulti/renderMulti). While still supported, the batched approach is recommended even for single models.

Legacy Path Characteristics: - Uses render(pass, pointCloud) method, called separately for each model - Uses each point cloud's own splat2DBuffer (managed by PointCloud module) - Uses cached per-cloud sort resources (WeakMap<PointCloud, PointCloudSortStuff>) - Executes separate draw calls for each model

// Still uses prepareMulti for preprocessing (recommended)
renderer.prepareMulti(encoder, queue, [pointCloud], args);
// Legacy path: uses render() instead of renderMulti()
renderer.render(pass, pointCloud); // Uses per-cloud cache, separate draw

Note: Even for a single model, renderMulti() is recommended as it uses global buffers and offers better performance.

Troubleshooting

Capacity Exceeded: If total points exceed globalCapacity, buffers are reallocated. Expect a brief frame spike but no crash.
Mixed Color Modes: Ensure PointCloud.colorMode is set correctly ('sh' or 'rgb') so the renderer selects the right preprocessor.
Depth Artifacts: Enable depth pipeline for Z-buffer testing, or disable it for pure back-to-front sorting.
Zero Instance Count: Always call prepareMulti before renderMulti; preprocessing populates the indirect buffer.
Stale Sort Results: Ensure prepareMulti runs every frame; sort resources are reset at the start of each preparation phase.

The renderer architecture provides a high-performance, resource-efficient pipeline for rendering multiple Gaussian splat models with minimal CPU overhead and optimal GPU utilization.