Renderer Module Architecture
The renderer module orchestrates the complete rendering pipeline for Gaussian splatting, coordinating preprocessing, GPU radix sorting, and indirect draw execution. GaussianRenderer serves as the high-level coordinator that manages static GPU resources, dual preprocessors (SH and RGB), a shared sorter instance, and global buffers for multi-model batching.
High-level Pipeline
┌────────────────────────────────────────────────────────────┐
│ Application Layer │
│ RenderLoop / App → prepareMulti() / renderMulti() │
└───────────────▲────────────────────────────────────────────┘
│
┌───────────────┴────────────────────────────────────────────┐
│ Renderer Layer │
│ GaussianRenderer │
│ • Static pipelines (render, depth) │
│ • Dual preprocessors (SH, RGB) │
│ • Single GPURSSorter instance │
│ • Global buffers (splat2D, sorter payload) │
└───────────────┬────────────────────────────────────────────┘
│
┌───────────────┴────────────────────────────────────────────┐
│ Preparation Phase (prepareMulti) │
│ 1. ensureGlobalCapacity(total points) │
│ 2. reset sorter indirect buffers │
│ 3. For each PointCloud: │
│ • Select preprocessor (SH vs RGB) │
│ • dispatchModel(baseOffset, countBuffer?) │
│ 4. recordSortIndirect(global sort) │
│ 5. Copy keys_size → drawIndirect.instanceCount │
└───────────────┬────────────────────────────────────────────┘
│
┌───────────────┴────────────────────────────────────────────┐
│ Rendering Phase (renderMulti) │
│ 1. Bind global renderBG (@group(0)) │
│ 2. Bind global sorter_render_bg (@group(1)) │
│ 3. Set pipeline (depth-aware if enabled) │
│ 4. drawIndirect(once for all models) │
└────────────────────────────────────────────────────────────┘
Key Design Principles
- Static Resource Caching: Pipelines, layouts, and indirect buffers are created once during
initialize()and reused across frames - Dual Preprocessor Support: Separate
GaussianPreprocessorinstances for SH (spherical harmonics) and RGB models enable efficient color mode switching - Global Sorting: Single shared sorter instance with global buffers allows multi-model batching into one indirect draw
- Per-Cloud Caching:
WeakMap<PointCloud, PointCloudSortStuff>caches sort resources per point cloud, rebuilding only when point count changes - Capacity Management: Global buffers grow with a 1.25× factor to amortize reallocations when total point count increases
- Depth Pipeline Variant: Optional depth-enabled pipeline created on-demand when
setDepthFormat()is called
Resource Lifecycle
Initialization (initialize())
- Sorter Creation:
GPURSSorter.create(device, queue)- async initialization with adapter testing - Preprocessor Setup: Two instances created:
preprocessorSH: Initialized withshDegree(default 3)preprocessorRGB: Initialized with degree 0 for direct RGB models- Pipeline Layout: Combines
PointCloud.renderBindGroupLayoutandGPURSSorter.createRenderBindGroupLayout - Render Pipelines: Two variants created:
- Standard pipeline: No depth testing
- Depth pipeline: Created with initial
depth24plusformat (recreated on format change) - Indirect Draw Buffer: 16-byte buffer initialized with
{vertexCount: 4, instanceCount: 0, firstVertex: 0, firstInstance: 0} - Global Buffers: Initial 1M-splat capacity allocated (
splat2Dbuffer +PointCloudSortStuff)
Frame Preparation (prepareMulti)
- Capacity Check: Sum all
pointCloud.numPoints, grow global buffers if needed (1.25× growth factor) - Sorter Reset:
recordResetIndirectBufferclearskeys_sizeanddispatch_xcounters - Per-Model Dispatch:
- Calculate
baseOffsetfor each model (cumulative sum of previous models' point counts) - Detect color mode:
pointCloud.colorMode→ selectpreprocessorSHorpreprocessorRGB - Build render settings: Merge
RenderArgswith per-cloud metadata (bbox, center, kernelSize, etc.) - Call
preprocessor.dispatchModel()with:- Global
splat2Dbuffer andsortStuff - Model-specific
baseOffsetand transform matrix - Optional ONNX
countBufferfor dynamic models
- Global
- Global Sort: Single
recordSortIndirectcall processes all models together - Indirect Draw Update: Copy
sorter_uni.keys_size(4 bytes) intodrawIndirectBuffer[4:8]to set instance count
Frame Rendering (renderMulti)
- Bind Groups:
@group(0): GlobalrenderBG(binds globalsplat2Dbuffer)@group(1): Globalsorter_render_bg(bindssorter_uniandpayload_a)- Pipeline Selection: Use depth pipeline if
useDepth && pipelineDepthexists, otherwise standard pipeline - Indirect Draw: Single
drawIndirect(drawIndirectBuffer, 0)call renders all visible splats
Bind Group Layouts
Render Pipeline Layout
The renderer uses two bind groups:
- @group(0):
PointCloud.renderBindGroupLayout(device) -
binding 2: Read-only access to projectedsplat2Dbuffer (vertex attributes) -
@group(1):
GPURSSorter.createRenderBindGroupLayout(device) binding 0:sorter_uni(read-onlyGeneralInfostruct withkeys_size)binding 4:payload_a(sorted index buffer for indirect draw)
Global vs Per-Cloud Resources
Global Path (multi-model):
- Single splat2D buffer sized to total capacity
- Single PointCloudSortStuff with global sort buffers
- Single renderBG bind group pointing to global splat2D
- One indirect draw for all models
Per-Cloud Path (legacy single-model):
- Each PointCloud has its own splat2DBuffer (managed by PointCloud)
- Cached PointCloudSortStuff per point cloud (via WeakMap)
- pointCloud.renderBindGroup() binds per-cloud splat2DBuffer
- Separate draw per model (though prepareMulti still recommended)
Preprocessor Selection
The renderer automatically selects the appropriate preprocessor based on PointCloud.colorMode:
'sh'mode: UsespreprocessorSH(initialized withshDegree)- Handles spherical harmonics coefficients (4, 12, 27, or 48 channels)
-
Evaluates SH basis functions in the fragment shader
-
'rgb'mode: UsespreprocessorRGB(initialized with degree 0) - Direct RGB color channels (3 or 4 channels)
- No SH evaluation needed
The selection happens in getColorMode() which reads pointCloud.colorMode. Both preprocessors write into the same global splat2D buffer format, ensuring compatibility with the shared render pipeline.
Global Buffer Management
Capacity Growth
ensureGlobalCapacity(total) implements dynamic buffer growth:
- Calculate needed capacity:
Math.max(1, total) - If current
globalCapacity >= needed, return early - Grow with 1.25× factor:
Math.ceil(needed * 1.25) - Destroy old buffers (if any):
globalBuffers.splat2D.destroy()sortStuffbuffers are owned by sorter (GC'd when unused)- Allocate new resources:
sorter.createSortStuff(device, newCapacity)→ newPointCloudSortStuffdevice.createBuffer()forsplat2D(size =newCapacity * BUFFER_CONFIG.SPLAT_STRIDE)- Create new
renderBGbind group withPointCloud.renderBindGroupLayout - Update
globalCapacityandglobalBuffersreference
Buffer Layout
Global splat2D Buffer:
- Size: globalCapacity * BUFFER_CONFIG.SPLAT_STRIDE
- Usage: STORAGE | COPY_DST | COPY_SRC
- Layout: Per-splat attributes (position, color, covariance, etc.) written by preprocessors
Global Sort Buffers (via PointCloudSortStuff):
- key_a, key_b: Ping-pong depth key buffers (padded to workgroup multiples)
- payload_a, payload_b: Ping-pong index buffers (final sorted order in payload_a)
- sorter_uni: GeneralInfo struct with keys_size (visible splat count)
- sorter_dis: Indirect dispatch buffer with workgroup counts
Depth Pipeline
The renderer supports optional depth testing via a separate pipeline variant:
Creation
createDepthPipeline() creates a depth-aware pipeline with:
- Same shader module and entry points as standard pipeline
- depthStencil configuration:
- Format: Configurable (depth24plus default, changeable via setDepthFormat())
- depthWriteEnabled: false (read-only depth test)
- depthCompare: 'less' (standard Z-buffer comparison)
Runtime Control
setDepthEnabled(enabled): TogglesuseDepthflagsetDepthFormat(format): UpdatesdepthFormatand recreates the depth pipeline
When useDepth && pipelineDepth is true, renderMulti() uses the depth pipeline; otherwise, it uses the standard pipeline. This allows switching between pure back-to-front sorting (no depth) and depth-assisted rendering.
Render Settings Merging
buildRenderSettings() merges RenderArgs with per-cloud metadata:
| Setting | Source Priority |
|---|---|
maxSHDegree |
min(args.maxSHDegree ?? pointCloud.shDeg, renderer.shDegree) |
showEnvMap |
args.showEnvMap ?? true |
mipSplatting |
args.mipSplatting ?? pointCloud.mipSplatting ?? false |
kernelSize |
args.kernelSize ?? pointCloud.kernelSize ?? DEFAULT_KERNEL_SIZE |
walltime |
args.walltime ?? 1.0 |
sceneExtend |
args.sceneExtend ?? computed sceneSize |
center |
args.sceneCenter ?? pointCloud.center |
clippingBoxMin/Max |
args.clippingBox ?? pointCloud.bbox |
These settings are passed to preprocessor.dispatchModel() and written into the preprocessor's uniform buffer for shader consumption.
Integration Points
Point Cloud Module
- Bind Group Layouts:
PointCloud.renderBindGroupLayout(device)provides the@group(0)layout - Per-Cloud Resources:
pointCloud.renderBindGroup()returns bind group for per-cloud rendering - Transform Matrix:
pointCloud.transform(4×4 matrix) passed to preprocessor for model-space projection - Metadata:
bbox,center,shDeg,colorMode,kernelSize,mipSplattingused for render settings - ONNX Support:
DynamicPointCloud.countBuffer()provides optional draw count for indirect pipelines
Preprocess Module
- Dual Preprocessors: Two
GaussianPreprocessorinstances handle SH and RGB models - Dispatch Interface:
dispatchModel()writes splats into globalsplat2Dbuffer at specifiedbaseOffset - Counter Updates: Preprocessors atomically increment
sorter_uni.keys_sizeandsorter_dis.dispatch_x - Settings Injection: Render settings (kernel size, SH degree, clipping box, etc.) written to preprocessor uniforms
Sorting Module
- Single Sorter: One
GPURSSorterinstance shared across all models - Layouts:
GPURSSorter.createRenderBindGroupLayout(device)provides@group(1)layout - Sort Resources:
sorter.createSortStuff(device, capacity)allocates global sort buffers - Indirect Sort:
recordSortIndirect()processes all models in one pass using counters from preprocessing - Payload Access: Sorted
payload_abuffer provides indices for indirect draw
Shader Module
- Gaussian Shader:
gaussianShader(fromsrc/shaders/index) implements vertex and fragment stages - Storage Access: Vertex shader reads all attributes from
@group(0)storage buffers - Blending: Fragment shader uses premultiplied alpha blending (
src: one, dst: one-minus-src-alpha) - Primitive: Triangle strip topology (4 vertices per splat)
Debug & Diagnostics
Statistics
getRenderStats(pointCloud) returns:
- gaussianCount: Total points in the point cloud
- visibleSplats: Latest keys_size from sorter (cached num_points if available)
- memoryUsage: Coarse estimate (Gaussian + SH buffers + sort buffers)
Debug Helpers
readInstanceCountDebug(): GPU→CPU readback ofdrawIndirectBuffer[4:8](instance count)readPayloadSampleDebug(n): Dumps firstnpayload indices from globalpayload_abufferdebugONNXCount(): Chains into preprocessor debug flow to trace ONNX-driven count buffers
Debug Logging
Enable verbose logging via (globalThis as any).GS_DEBUG_LOGS = true. The renderer logs:
- Capacity growth events
- Per-model dispatch offsets
- Global sort completion
- Instance count updates
Performance Considerations
Resource Reuse
- Static Resources: Pipelines, layouts, and indirect buffer created once, reused forever
- Per-Cloud Caching: Sort resources cached in
WeakMap, rebuilt only on point count change - Global Buffers: Grow with 1.25× factor to reduce reallocation frequency
Multi-Model Batching
- Single Sort: One radix sort pass handles all models together
- Single Draw: One indirect draw call renders all visible splats
- Reduced Overhead: Eliminates per-model pipeline switches and draw calls
Memory Footprint
Global buffers scale with total point count:
- splat2D: capacity * SPLAT_STRIDE bytes
- Sort buffers: capacity * (key_size + payload_size) * 2 (ping-pong) + histogram scratch
For scenes with many small models, global buffers may exceed per-model memory, but this is amortized by the batching benefits.
Common Patterns
Multi-Model Frame
await renderer.initialize();
renderer.prepareMulti(encoder, queue, pointClouds, {
camera,
viewport: [width, height],
maxSHDegree: 3,
});
const pass = encoder.beginRenderPass(passDesc);
renderer.renderMulti(pass, pointClouds);
pass.end();
Depth-Enabled Rendering
renderer.setDepthFormat('depth32float');
renderer.setDepthEnabled(true);
// Subsequent renderMulti() calls use depth pipeline
Per-Model Rendering (Legacy)
Legacy Path refers to the per-model rendering approach used before the introduction of multi-model batching (prepareMulti/renderMulti). While still supported, the batched approach is recommended even for single models.
Legacy Path Characteristics:
- Uses render(pass, pointCloud) method, called separately for each model
- Uses each point cloud's own splat2DBuffer (managed by PointCloud module)
- Uses cached per-cloud sort resources (WeakMap<PointCloud, PointCloudSortStuff>)
- Executes separate draw calls for each model
// Still uses prepareMulti for preprocessing (recommended)
renderer.prepareMulti(encoder, queue, [pointCloud], args);
// Legacy path: uses render() instead of renderMulti()
renderer.render(pass, pointCloud); // Uses per-cloud cache, separate draw
Note: Even for a single model, renderMulti() is recommended as it uses global buffers and offers better performance.
Troubleshooting
- Capacity Exceeded: If total points exceed
globalCapacity, buffers are reallocated. Expect a brief frame spike but no crash. - Mixed Color Modes: Ensure
PointCloud.colorModeis set correctly ('sh'or'rgb') so the renderer selects the right preprocessor. - Depth Artifacts: Enable depth pipeline for Z-buffer testing, or disable it for pure back-to-front sorting.
- Zero Instance Count: Always call
prepareMultibeforerenderMulti; preprocessing populates the indirect buffer. - Stale Sort Results: Ensure
prepareMultiruns every frame; sort resources are reset at the start of each preparation phase.
The renderer architecture provides a high-performance, resource-efficient pipeline for rendering multiple Gaussian splat models with minimal CPU overhead and optimal GPU utilization.