ONNX Module
This module documents the ONNX/WebGPU integration used to generate Gaussian Splatting data directly on the GPU with no CPU readbacks. It provides GPU-only inference for dynamic point cloud generation, supporting both static and per-frame dynamic inference modes.
Overview
- Execution Provider: onnxruntime-web (WebGPU)
- I/O Binding: All inputs and outputs are GPU buffers (no CPU roundtrips)
- Device Sharing: Reuses the app's WebGPU device for buffer compatibility
- Precision Support: Automatic detection and configuration for float16, float32, int8, uint8
- Outputs:
gaussian: (N, 10) packed gaussian parameters (precision auto-detected)shorrgb: (N, colorDim) color data (SH or RGB, precision auto-detected)num_points: int32[1] actual count per inference- Consumers:
DynamicPointCloud+GaussianPreprocessor(num_points GPU overwrite)
Key Features
- GPU-Only Pipeline: No CPU readbacks, all data stays on GPU
- Automatic Precision Detection: Detects data types from model metadata or output names
- Static & Dynamic Modes: Supports one-time inference or per-frame updates
- Color Mode Detection: Automatically detects SH vs RGB output
- Capacity Detection: Auto-detects max points from model metadata
- Graph Capture Support: Optional WebGPU graph capture for performance
- Resource Management: Proper cleanup and disposal of GPU resources
Files of Interest
src/onnx/onnx_generator.ts- High-level generator facadesrc/onnx/onnx_gpu_io.ts- Low-level GPU I/O bindingsrc/onnx/precision-detector.ts- Automatic precision detectionsrc/onnx/precision-types.ts- Precision type definitionssrc/app/managers/onnx-manager.ts- ONNX model lifecycle management
Quick Start
Basic Usage (via ONNXManager)
import { ONNXManager } from './app/managers/onnx-manager';
const onnxManager = new ONNXManager(modelManager);
// Load ONNX model (static inference)
const entry = await onnxManager.loadONNXModel(
device,
'/models/gaussians3d.onnx',
cameraMatrix,
projectionMatrix,
'static-model',
{
staticInference: true,
maxPoints: 2_000_000,
debugLogging: true
}
);
// Load ONNX model (dynamic per-frame inference)
const dynamicEntry = await onnxManager.loadONNXModel(
device,
'/models/gaussians3d_dynamic.onnx',
cameraMatrix,
projectionMatrix,
'dynamic-model',
{
staticInference: false, // Enable per-frame updates
maxPoints: 2_000_000,
debugLogging: true
}
);
Direct Generator Usage
import { ONNXGenerator } from './onnx/onnx_generator';
const generator = new ONNXGenerator({
modelUrl: '/models/gaussians3d.onnx',
maxPoints: 2_000_000,
debugLogging: true,
device: gpuDevice
});
await generator.initialize();
await generator.generate({
cameraMatrix: viewMatrix,
projectionMatrix: projMatrix,
time: performance.now() / 1000
});
// Access GPU buffers
const gaussianBuffer = generator.getGaussianBuffer();
const shBuffer = generator.getSHBuffer();
const countBuffer = generator.getCountBuffer();
Precision Configuration
// Manual precision override
const entry = await onnxManager.loadONNXModel(
device,
'/models/model.onnx',
cam, proj, 'model',
{
precisionConfig: {
gaussian: { dataType: 'float32', bytesPerElement: 4 },
color: { dataType: 'float16', bytesPerElement: 2 }
}
}
);
Data Flow
- Loading:
ONNXManager.loadONNXModel()createsONNXGeneratorandDynamicPointCloud - Initialization: Generator initializes ONNX Runtime session and preallocates GPU buffers
- Inference:
ONNXGenerator.generate()runs inference and writes to GPU buffers - Dynamic Updates:
AnimationManagercallsDynamicPointCloud.update()→ONNXGenerator.generate()per frame - Rendering: Preprocessor reads
countBufferto update instance count, then sort and render
See: Architecture and API Reference for details.
Related Docs
- Architecture – Session lifecycle, buffer ownership, and precision-detection flow.
- API Reference – Generator, manager, and precision-config APIs with usage notes.
- Point Cloud Module – Shows how ONNX outputs feed
DynamicPointCloud. - Preprocess Module – Details how ONNX counts and precision flags drive projection.
- Timeline Module – Covers per-frame animation hooks for dynamic generators.
- Config Module – Documents how ORT WASM paths are configured before inference.