ONNX Module API Reference
Complete API documentation for the ONNX integration classes and helper utilities.
Table of Contents
- ONNXManager
- ONNXGenerator
- OnnxGpuIO
- PrecisionDetector
- Precision Types
- ONNXModelTester
- ONNXTestUtils
- Utility Functions
ONNXManager
Coordinates ONNX model loading, inference, and resource management.
Path: src/app/managers/onnx-manager.ts
constructor(modelManager: ModelManager)
Create a manager scoped to a ModelManager.
Methods
loadONNXModel(device, modelPath, cameraMatrix, projectionMatrix, name?, options?): Promise
Load an ONNX model from URL and create a DynamicPointCloud.
Parameters:
- device: GPUDevice - Shared WebGPU device
- modelPath: string - URL string to ONNX model
- cameraMatrix: Float32Array - Initial view matrix (4×4)
- projectionMatrix: Float32Array - Initial projection matrix (4×4)
- name?: string - Optional model name (auto-generated if not provided)
- options?: ONNXLoadOptions - Loading options
Returns: Promise<ModelEntry> - Model entry registered with ModelManager
ONNXLoadOptions:
interface ONNXLoadOptions {
staticInference?: boolean; // If true, run inference once; if false, enable per-frame updates
maxPoints?: number; // Manual override for buffer allocation (auto-detected if not provided)
debugLogging?: boolean; // Enable debug logging
precisionConfig?: PrecisionConfig; // Manual precision override
}
loadONNXFromFile(device, file, cameraMatrix?, projectionMatrix?): Promise
Load from a browser File object. Optional matrices default to identity (camera at (0,0,5) looking at origin).
Parameters:
- device: GPUDevice - Shared WebGPU device
- file: File - Browser File object
- cameraMatrix?: Float32Array | null - Optional initial view matrix
- projectionMatrix?: Float32Array | null - Optional initial projection matrix
Returns: Promise<ModelEntry>
updateCameraMatrices(modelName, cameraMatrix, projectionMatrix): Promise
Update the view/projection matrices for a named ONNX model. Note: Currently a placeholder; dynamic models update automatically via AnimationManager.
disposeModel(modelId: string): void
Dispose a specific ONNX model by ID. Cleans up GPU buffers and ORT sessions.
dispose(): void
Dispose all ONNX resources. Cleans up all generators and point clouds.
getGenerator(modelId: string): ONNXGenerator | undefined
Get generator for a specific model (for debugging/advanced use).
getPointCloud(modelId: string): DynamicPointCloud | undefined
Get point cloud for a specific model (for debugging/advanced use).
hasONNXModels(): boolean
Return true if any ONNX models are managed.
getONNXModels(): string[]
Return array of all loaded ONNX model IDs.
getONNXPerformanceStats():
Return performance statistics:
- modelCount: Number of loaded ONNX models
- totalGenerators: Total number of ONNXGenerator instances
- totalPointClouds: Total number of DynamicPointCloud instances
ONNXGenerator
Runs ONNX inference and exposes GPU buffers. Provides a simplified facade over OnnxGpuIO.
Path: src/onnx/onnx_generator.ts
constructor(cfg: ONNXGeneratorConfig)
Configuration:
interface ONNXGeneratorConfig {
modelUrl: string; // ONNX model URL or path
maxPoints?: number; // Optional, will be auto-detected from model metadata
debugLogging?: boolean; // Enable debug logging
device?: GPUDevice; // Pass the app's WebGPU device to avoid mismatch
precisionConfig?: PrecisionConfig; // Manual precision override
}
Methods
Lifecycle:
- initialize(device?: GPUDevice): Promise<void> – Initialize the generator. Uses device parameter or cfg.device. Throws if no device available.
Inference:
- generate(inputData?: { cameraMatrix?: Float32Array, projectionMatrix?: Float32Array, time?: number }): Promise<void> – Run inference with optional inputs. For static models, can be called with {}. For dynamic models, provide camera matrices and time.
GPU Buffer Access:
- getGaussianBuffer(): GPUBuffer – Get preallocated Gaussian data buffer
- getSHBuffer(): GPUBuffer – Get preallocated color (SH/RGB) data buffer
- getCountBuffer(): GPUBuffer – Get point count buffer (int32)
Device & Inputs:
- getDevice(): GPUDevice – Get the WebGPU device
- getInputNames(): readonly string[] – Get model's expected input names
Metadata:
- getDetectedCapacity(): number – Get detected max points from model metadata
- getDetectedColorMode(): 'sh' | 'rgb' – Get detected color mode
- getDetectedColorDim(): number – Get detected color dimensions (48 for SH, 3 for RGB)
- getActualMaxPoints(): number – Get actual max points (detected or configured)
Precision Information:
- getGaussianPrecision(): PrecisionMetadata – Get precision metadata for gaussian output
- getColorPrecision(): PrecisionMetadata – Get precision metadata for color output
Cleanup:
- dispose(): void – Release all GPU resources and ORT session
OnnxGpuIO
Handles low-level GPU I/O binding with ONNX Runtime. Manages session creation, buffer allocation, and inference execution.
Path: src/onnx/onnx_gpu_io.ts
constructor()
Creates a new OnnxGpuIO instance. Must call init() before use.
Methods
init(cfg: OnnxGpuIOConfig & { precisionConfig?: PrecisionConfig }): Promise
Initialize I/O with configuration.
Configuration:
interface OnnxGpuIOConfig {
modelUrl: string; // ONNX model path
maxPoints?: number; // Preset max points (optional, inferred from metadata)
device: GPUDevice; // Use app's existing WebGPU device
verbose?: boolean; // Enable verbose debug logging
precisionConfig?: PrecisionConfig; // Manual precision override
}
Process: 1. Initializes ONNX Runtime environment 2. Creates WebGPU-only InferenceSession (with graph capture fallback) 3. Detects capacity and color mode from metadata 4. Detects precision (or applies manual config) 5. Preallocates all GPU buffers with proper sizes and alignment
updateInputBuffers(view?: mat4, proj?: mat4, time?: number): void
Write inputs into preallocated GPU buffers. Updates cameraMatrixBuf, projMatrixBuf, and timeBuf.
runInference(input?: { cameraMatrix?: Float32Array, projectionMatrix?: Float32Array, time?: number }): Promise
Execute inference once. Uses exclusive execution chain to prevent concurrent conflicts. Binds GPU buffers as feeds/fetches and executes session without CPU roundtrips.
Note: Uses OnnxGpuIO.runExclusive() internally to serialize inference calls.
destroy(): void
Release all resources: ORT session, GPU buffers, and device references.
Static Methods
runExclusive(fn: () => Promise): Promise
Global exclusive execution coordinator. Ensures only one inference runs at a time to prevent ORT WebGPU IOBinding session conflicts.
Public Properties
Output Buffers:
- gaussBuf: GPUBuffer – Preallocated Gaussian data buffer
- shBuf: GPUBuffer – Preallocated color (SH/RGB) data buffer
- countBuf: GPUBuffer – Point count buffer (int32)
Input Buffers:
- cameraMatrixBuf: GPUBuffer – Camera view matrix buffer (4×4 float32)
- projMatrixBuf: GPUBuffer – Projection matrix buffer (4×4 float32)
- timeBuf: GPUBuffer – Time input buffer (float32)
Session & Device:
- session: ort.InferenceSession – ONNX Runtime inference session
- device: GPUDevice – WebGPU device
Metadata:
- inputNames: readonly string[] – Model's expected input names
- maxPoints: number – Actual max points (detected or configured)
- actualPoints: number – Actual points returned by the model
Detection Results:
- detectedCapacity: number – Detected max points from metadata
- detectedColorMode: 'sh' | 'rgb' – Detected color mode
- detectedColorDim: number – Detected color dimensions
- detectedGaussOutputName: string | null – Detected gaussian output name
- detectedGaussFields: number – Number of gaussian fields (usually 10)
- detectedColorOutputName: string | null – Detected color output name
PrecisionDetector
Automatic precision detection for ONNX model outputs.
Path: src/onnx/precision-detector.ts
Static Methods
detectOutputPrecisionFromName(outputName: string): PrecisionMetadata
Detect precision from output name suffix.
Suffix Patterns:
- _f32, _float32 → float32 (4 bytes)
- _f16, _float16 → float16 (2 bytes)
- _i8, _int8 → int8 (1 byte)
- _u8, _uint8 → uint8 (1 byte)
- Default → float16 (2 bytes)
detectFromMetadataPreferringNameSuffix(session: ort.InferenceSession, outputName: string): PrecisionMetadata
Detect precision with priority: 1. Check session output metadata for type information 2. Fall back to name-based detection if metadata unavailable
extractQuantizationParams(session: ort.InferenceSession, tensorName: string):
Extract quantization parameters (scale, zeroPoint) from model initializers. Best-effort extraction from model graph.
calculateBufferSize(dims: number[], precision: PrecisionMetadata): number
Calculate buffer size with 16-byte alignment.
Precision Types
Type definitions for precision handling.
Path: src/onnx/precision-types.ts
Types
type OnnxDataType = 'float32' | 'float16' | 'int8' | 'uint8';
interface PrecisionMetadata {
dataType: OnnxDataType;
bytesPerElement: number;
scale?: number; // For quantized int8/uint8
zeroPoint?: number; // For quantized int8/uint8
}
interface OutputBufferDescriptor {
name: string;
precision: PrecisionMetadata;
dims: number[];
sizeInBytes: number;
}
interface PrecisionConfig {
gaussian?: Partial<PrecisionMetadata>; // Override for gaussian output
color?: Partial<PrecisionMetadata>; // Override for color output
autoDetect?: boolean; // Legacy flag (deprecated)
}
Utility Functions
align16(n: number): number– Align number to 16-byte boundarycalcSizeInBytes(dims: number[], p: PrecisionMetadata): number– Calculate buffer size with alignmentdataTypeToOrtString(p: PrecisionMetadata): 'float16' | 'float32' | 'int8' | 'uint8'– Convert to ORT type string
ONNXModelTester
Utility for loading and validating ONNX models in isolation.
Static methods
initialize(): Promise<void>– Set up ONNX Runtime.loadModel(modelPath?: string): Promise<void>– Default path'./models/gaussians3d.onnx'.testInference(inputs?: any): Promise<any>– Run a test inference.dispose(): void
ONNXTestUtils
Helpers for debugging and benchmarking.
Static methods
printInferenceReport(generator): voidcompareWithReference(generator, referenceData): voidvalidateOutputBuffers(generator): booleanmeasurePerformance(generator, iterations = 100): PerformanceMetrics
interface PerformanceMetrics {
averageTime: number;
minTime: number;
maxTime: number;
totalTime: number;
iterations: number;
}
Utility Functions
testONNXModel(modelUrl: string, device: GPUDevice): Promise<TestResult>runONNXIntegrationTest(): Promise<void>runONNXPerformanceTest(): Promise<PerformanceMetrics>
interface TestResult {
success: boolean;
error?: string;
performance?: PerformanceMetrics;
outputValidation?: boolean;
}
Usage examples
Basic ONNX Generator Usage
import { ONNXGenerator } from './onnx/onnx_generator';
const generator = new ONNXGenerator({
modelUrl: '/models/gaussians3d.onnx',
maxPoints: 1_000_000,
debugLogging: true,
device: gpuDevice
});
await generator.initialize();
await generator.generate({
cameraMatrix: viewMatrix,
projectionMatrix: projMatrix,
time: performance.now() / 1000,
});
const gaussianBuffer = generator.getGaussianBuffer();
const shBuffer = generator.getSHBuffer();
const countBuffer = generator.getCountBuffer();
// Access precision information
const gaussPrecision = generator.getGaussianPrecision();
const colorPrecision = generator.getColorPrecision();
console.log(`Gaussian: ${gaussPrecision.dataType}, Color: ${colorPrecision.dataType}`);
Using ONNXManager
import { ONNXManager } from './app/managers/onnx-manager';
const onnxManager = new ONNXManager(modelManager);
// Load static model
const staticEntry = await onnxManager.loadONNXModel(
device,
'/models/static.onnx',
cameraMatrix,
projectionMatrix,
'static-model',
{
staticInference: true,
maxPoints: 2_000_000,
debugLogging: true
}
);
// Load dynamic model (per-frame updates)
const dynamicEntry = await onnxManager.loadONNXModel(
device,
'/models/dynamic.onnx',
cameraMatrix,
projectionMatrix,
'dynamic-model',
{
staticInference: false, // Enable per-frame updates
maxPoints: 2_000_000,
debugLogging: true
}
);
// Load with precision override
const quantizedEntry = await onnxManager.loadONNXModel(
device,
'/models/quantized.onnx',
cameraMatrix,
projectionMatrix,
'quantized-model',
{
precisionConfig: {
gaussian: { dataType: 'int8', bytesPerElement: 1 },
color: { dataType: 'int8', bytesPerElement: 1 }
}
}
);
// Access generator for advanced use
const generator = onnxManager.getGenerator(dynamicEntry.id);
if (generator) {
console.log(`Color mode: ${generator.getDetectedColorMode()}`);
console.log(`Capacity: ${generator.getDetectedCapacity()}`);
}
// Cleanup
onnxManager.disposeModel(staticEntry.id);
// or dispose all
onnxManager.dispose();
Performance testing
import { ONNXTestUtils } from './test_utils';
const metrics = ONNXTestUtils.measurePerformance(generator, 1000);
console.log(`Average: ${metrics.averageTime} ms`);
console.log(`Min: ${metrics.minTime} ms`);
console.log(`Max: ${metrics.maxTime} ms`);
ONNXTestUtils.printInferenceReport(generator);
Model testing
import { testONNXModel } from './test_loader';
const result = await testONNXModel('/models/test.onnx', device);
if (result.success) {
console.log('Model test passed');
console.log('Performance:', result.performance);
} else {
console.error('Model test failed:', result.error);
}
Notes
-
Shared Device – The ONNX pipeline reuses the app's
GPUDevice, guaranteeing buffer compatibility and avoiding device mismatch errors. -
GPU-Only – No CPU roundtrips; all buffers stay on the GPU. Inputs and outputs are bound directly as GPU buffers.
-
Dynamic Models – Supports per-frame updates without reallocating resources.
DynamicPointCloudis wired toONNXGeneratorfor automatic updates. -
Precision Detection – Automatically detects data types from model metadata or output names. Supports manual override via
PrecisionConfig. -
Exclusive Execution – Uses global execution chain to prevent concurrent inference conflicts with ORT WebGPU IOBinding.
-
Graph Capture – Supports WebGPU graph capture for performance (with automatic fallback if unsupported).
-
Performance – Device sharing and buffer reuse keep inference fast. Preallocated buffers eliminate allocation overhead.
-
Debugging – Helper utilities (
ONNXTestUtils,ONNXModelTester) simplify validation and profiling. -
Compatibility – Works for both static and dynamic inference with automatic metadata detection (capacity, color mode, precision).
-
Resource Management – Proper cleanup via
dispose()methods ensures GPU resources are released.