Skip to content

ONNX Module API Reference

Complete API documentation for the ONNX integration classes and helper utilities.

Table of Contents


ONNXManager

Coordinates ONNX model loading, inference, and resource management.

Path: src/app/managers/onnx-manager.ts

constructor(modelManager: ModelManager)

Create a manager scoped to a ModelManager.

Methods

loadONNXModel(device, modelPath, cameraMatrix, projectionMatrix, name?, options?): Promise

Load an ONNX model from URL and create a DynamicPointCloud.

Parameters: - device: GPUDevice - Shared WebGPU device - modelPath: string - URL string to ONNX model - cameraMatrix: Float32Array - Initial view matrix (4×4) - projectionMatrix: Float32Array - Initial projection matrix (4×4) - name?: string - Optional model name (auto-generated if not provided) - options?: ONNXLoadOptions - Loading options

Returns: Promise<ModelEntry> - Model entry registered with ModelManager

ONNXLoadOptions:

interface ONNXLoadOptions {
  staticInference?: boolean;        // If true, run inference once; if false, enable per-frame updates
  maxPoints?: number;               // Manual override for buffer allocation (auto-detected if not provided)
  debugLogging?: boolean;           // Enable debug logging
  precisionConfig?: PrecisionConfig; // Manual precision override
}

loadONNXFromFile(device, file, cameraMatrix?, projectionMatrix?): Promise

Load from a browser File object. Optional matrices default to identity (camera at (0,0,5) looking at origin).

Parameters: - device: GPUDevice - Shared WebGPU device - file: File - Browser File object - cameraMatrix?: Float32Array | null - Optional initial view matrix - projectionMatrix?: Float32Array | null - Optional initial projection matrix

Returns: Promise<ModelEntry>

updateCameraMatrices(modelName, cameraMatrix, projectionMatrix): Promise

Update the view/projection matrices for a named ONNX model. Note: Currently a placeholder; dynamic models update automatically via AnimationManager.

disposeModel(modelId: string): void

Dispose a specific ONNX model by ID. Cleans up GPU buffers and ORT sessions.

dispose(): void

Dispose all ONNX resources. Cleans up all generators and point clouds.

getGenerator(modelId: string): ONNXGenerator | undefined

Get generator for a specific model (for debugging/advanced use).

getPointCloud(modelId: string): DynamicPointCloud | undefined

Get point cloud for a specific model (for debugging/advanced use).

hasONNXModels(): boolean

Return true if any ONNX models are managed.

getONNXModels(): string[]

Return array of all loaded ONNX model IDs.

getONNXPerformanceStats():

Return performance statistics: - modelCount: Number of loaded ONNX models - totalGenerators: Total number of ONNXGenerator instances - totalPointClouds: Total number of DynamicPointCloud instances


ONNXGenerator

Runs ONNX inference and exposes GPU buffers. Provides a simplified facade over OnnxGpuIO.

Path: src/onnx/onnx_generator.ts

constructor(cfg: ONNXGeneratorConfig)

Configuration:

interface ONNXGeneratorConfig {
  modelUrl: string;                    // ONNX model URL or path
  maxPoints?: number;                  // Optional, will be auto-detected from model metadata
  debugLogging?: boolean;              // Enable debug logging
  device?: GPUDevice;                  // Pass the app's WebGPU device to avoid mismatch
  precisionConfig?: PrecisionConfig;   // Manual precision override
}

Methods

Lifecycle: - initialize(device?: GPUDevice): Promise<void> – Initialize the generator. Uses device parameter or cfg.device. Throws if no device available.

Inference: - generate(inputData?: { cameraMatrix?: Float32Array, projectionMatrix?: Float32Array, time?: number }): Promise<void> – Run inference with optional inputs. For static models, can be called with {}. For dynamic models, provide camera matrices and time.

GPU Buffer Access: - getGaussianBuffer(): GPUBuffer – Get preallocated Gaussian data buffer - getSHBuffer(): GPUBuffer – Get preallocated color (SH/RGB) data buffer - getCountBuffer(): GPUBuffer – Get point count buffer (int32)

Device & Inputs: - getDevice(): GPUDevice – Get the WebGPU device - getInputNames(): readonly string[] – Get model's expected input names

Metadata: - getDetectedCapacity(): number – Get detected max points from model metadata - getDetectedColorMode(): 'sh' | 'rgb' – Get detected color mode - getDetectedColorDim(): number – Get detected color dimensions (48 for SH, 3 for RGB) - getActualMaxPoints(): number – Get actual max points (detected or configured)

Precision Information: - getGaussianPrecision(): PrecisionMetadata – Get precision metadata for gaussian output - getColorPrecision(): PrecisionMetadata – Get precision metadata for color output

Cleanup: - dispose(): void – Release all GPU resources and ORT session


OnnxGpuIO

Handles low-level GPU I/O binding with ONNX Runtime. Manages session creation, buffer allocation, and inference execution.

Path: src/onnx/onnx_gpu_io.ts

constructor()

Creates a new OnnxGpuIO instance. Must call init() before use.

Methods

init(cfg: OnnxGpuIOConfig & { precisionConfig?: PrecisionConfig }): Promise

Initialize I/O with configuration.

Configuration:

interface OnnxGpuIOConfig {
  modelUrl: string;            // ONNX model path
  maxPoints?: number;          // Preset max points (optional, inferred from metadata)
  device: GPUDevice;          // Use app's existing WebGPU device
  verbose?: boolean;           // Enable verbose debug logging
  precisionConfig?: PrecisionConfig; // Manual precision override
}

Process: 1. Initializes ONNX Runtime environment 2. Creates WebGPU-only InferenceSession (with graph capture fallback) 3. Detects capacity and color mode from metadata 4. Detects precision (or applies manual config) 5. Preallocates all GPU buffers with proper sizes and alignment

updateInputBuffers(view?: mat4, proj?: mat4, time?: number): void

Write inputs into preallocated GPU buffers. Updates cameraMatrixBuf, projMatrixBuf, and timeBuf.

runInference(input?: { cameraMatrix?: Float32Array, projectionMatrix?: Float32Array, time?: number }): Promise

Execute inference once. Uses exclusive execution chain to prevent concurrent conflicts. Binds GPU buffers as feeds/fetches and executes session without CPU roundtrips.

Note: Uses OnnxGpuIO.runExclusive() internally to serialize inference calls.

destroy(): void

Release all resources: ORT session, GPU buffers, and device references.

Static Methods

runExclusive(fn: () => Promise): Promise

Global exclusive execution coordinator. Ensures only one inference runs at a time to prevent ORT WebGPU IOBinding session conflicts.

Public Properties

Output Buffers: - gaussBuf: GPUBuffer – Preallocated Gaussian data buffer - shBuf: GPUBuffer – Preallocated color (SH/RGB) data buffer - countBuf: GPUBuffer – Point count buffer (int32)

Input Buffers: - cameraMatrixBuf: GPUBuffer – Camera view matrix buffer (4×4 float32) - projMatrixBuf: GPUBuffer – Projection matrix buffer (4×4 float32) - timeBuf: GPUBuffer – Time input buffer (float32)

Session & Device: - session: ort.InferenceSession – ONNX Runtime inference session - device: GPUDevice – WebGPU device

Metadata: - inputNames: readonly string[] – Model's expected input names - maxPoints: number – Actual max points (detected or configured) - actualPoints: number – Actual points returned by the model

Detection Results: - detectedCapacity: number – Detected max points from metadata - detectedColorMode: 'sh' | 'rgb' – Detected color mode - detectedColorDim: number – Detected color dimensions - detectedGaussOutputName: string | null – Detected gaussian output name - detectedGaussFields: number – Number of gaussian fields (usually 10) - detectedColorOutputName: string | null – Detected color output name


PrecisionDetector

Automatic precision detection for ONNX model outputs.

Path: src/onnx/precision-detector.ts

Static Methods

detectOutputPrecisionFromName(outputName: string): PrecisionMetadata

Detect precision from output name suffix.

Suffix Patterns: - _f32, _float32 → float32 (4 bytes) - _f16, _float16 → float16 (2 bytes) - _i8, _int8 → int8 (1 byte) - _u8, _uint8 → uint8 (1 byte) - Default → float16 (2 bytes)

detectFromMetadataPreferringNameSuffix(session: ort.InferenceSession, outputName: string): PrecisionMetadata

Detect precision with priority: 1. Check session output metadata for type information 2. Fall back to name-based detection if metadata unavailable

extractQuantizationParams(session: ort.InferenceSession, tensorName: string):

Extract quantization parameters (scale, zeroPoint) from model initializers. Best-effort extraction from model graph.

calculateBufferSize(dims: number[], precision: PrecisionMetadata): number

Calculate buffer size with 16-byte alignment.


Precision Types

Type definitions for precision handling.

Path: src/onnx/precision-types.ts

Types

type OnnxDataType = 'float32' | 'float16' | 'int8' | 'uint8';

interface PrecisionMetadata {
  dataType: OnnxDataType;
  bytesPerElement: number;
  scale?: number;      // For quantized int8/uint8
  zeroPoint?: number;  // For quantized int8/uint8
}

interface OutputBufferDescriptor {
  name: string;
  precision: PrecisionMetadata;
  dims: number[];
  sizeInBytes: number;
}

interface PrecisionConfig {
  gaussian?: Partial<PrecisionMetadata>;  // Override for gaussian output
  color?: Partial<PrecisionMetadata>;     // Override for color output
  autoDetect?: boolean;                    // Legacy flag (deprecated)
}

Utility Functions

  • align16(n: number): number – Align number to 16-byte boundary
  • calcSizeInBytes(dims: number[], p: PrecisionMetadata): number – Calculate buffer size with alignment
  • dataTypeToOrtString(p: PrecisionMetadata): 'float16' | 'float32' | 'int8' | 'uint8' – Convert to ORT type string

ONNXModelTester

Utility for loading and validating ONNX models in isolation.

Static methods

  • initialize(): Promise<void> – Set up ONNX Runtime.
  • loadModel(modelPath?: string): Promise<void> – Default path './models/gaussians3d.onnx'.
  • testInference(inputs?: any): Promise<any> – Run a test inference.
  • dispose(): void

ONNXTestUtils

Helpers for debugging and benchmarking.

Static methods

  • printInferenceReport(generator): void
  • compareWithReference(generator, referenceData): void
  • validateOutputBuffers(generator): boolean
  • measurePerformance(generator, iterations = 100): PerformanceMetrics
interface PerformanceMetrics {
  averageTime: number;
  minTime: number;
  maxTime: number;
  totalTime: number;
  iterations: number;
}

Utility Functions

  • testONNXModel(modelUrl: string, device: GPUDevice): Promise<TestResult>
  • runONNXIntegrationTest(): Promise<void>
  • runONNXPerformanceTest(): Promise<PerformanceMetrics>
interface TestResult {
  success: boolean;
  error?: string;
  performance?: PerformanceMetrics;
  outputValidation?: boolean;
}

Usage examples

Basic ONNX Generator Usage

import { ONNXGenerator } from './onnx/onnx_generator';

const generator = new ONNXGenerator({
  modelUrl: '/models/gaussians3d.onnx',
  maxPoints: 1_000_000,
  debugLogging: true,
  device: gpuDevice
});

await generator.initialize();
await generator.generate({
  cameraMatrix: viewMatrix,
  projectionMatrix: projMatrix,
  time: performance.now() / 1000,
});

const gaussianBuffer = generator.getGaussianBuffer();
const shBuffer = generator.getSHBuffer();
const countBuffer = generator.getCountBuffer();

// Access precision information
const gaussPrecision = generator.getGaussianPrecision();
const colorPrecision = generator.getColorPrecision();
console.log(`Gaussian: ${gaussPrecision.dataType}, Color: ${colorPrecision.dataType}`);

Using ONNXManager

import { ONNXManager } from './app/managers/onnx-manager';

const onnxManager = new ONNXManager(modelManager);

// Load static model
const staticEntry = await onnxManager.loadONNXModel(
  device,
  '/models/static.onnx',
  cameraMatrix,
  projectionMatrix,
  'static-model',
  { 
    staticInference: true,
    maxPoints: 2_000_000,
    debugLogging: true
  }
);

// Load dynamic model (per-frame updates)
const dynamicEntry = await onnxManager.loadONNXModel(
  device,
  '/models/dynamic.onnx',
  cameraMatrix,
  projectionMatrix,
  'dynamic-model',
  { 
    staticInference: false,  // Enable per-frame updates
    maxPoints: 2_000_000,
    debugLogging: true
  }
);

// Load with precision override
const quantizedEntry = await onnxManager.loadONNXModel(
  device,
  '/models/quantized.onnx',
  cameraMatrix,
  projectionMatrix,
  'quantized-model',
  {
    precisionConfig: {
      gaussian: { dataType: 'int8', bytesPerElement: 1 },
      color: { dataType: 'int8', bytesPerElement: 1 }
    }
  }
);

// Access generator for advanced use
const generator = onnxManager.getGenerator(dynamicEntry.id);
if (generator) {
  console.log(`Color mode: ${generator.getDetectedColorMode()}`);
  console.log(`Capacity: ${generator.getDetectedCapacity()}`);
}

// Cleanup
onnxManager.disposeModel(staticEntry.id);
// or dispose all
onnxManager.dispose();

Performance testing

import { ONNXTestUtils } from './test_utils';

const metrics = ONNXTestUtils.measurePerformance(generator, 1000);
console.log(`Average: ${metrics.averageTime} ms`);
console.log(`Min: ${metrics.minTime} ms`);
console.log(`Max: ${metrics.maxTime} ms`);

ONNXTestUtils.printInferenceReport(generator);

Model testing

import { testONNXModel } from './test_loader';

const result = await testONNXModel('/models/test.onnx', device);
if (result.success) {
  console.log('Model test passed');
  console.log('Performance:', result.performance);
} else {
  console.error('Model test failed:', result.error);
}

Notes

  1. Shared Device – The ONNX pipeline reuses the app's GPUDevice, guaranteeing buffer compatibility and avoiding device mismatch errors.

  2. GPU-Only – No CPU roundtrips; all buffers stay on the GPU. Inputs and outputs are bound directly as GPU buffers.

  3. Dynamic Models – Supports per-frame updates without reallocating resources. DynamicPointCloud is wired to ONNXGenerator for automatic updates.

  4. Precision Detection – Automatically detects data types from model metadata or output names. Supports manual override via PrecisionConfig.

  5. Exclusive Execution – Uses global execution chain to prevent concurrent inference conflicts with ORT WebGPU IOBinding.

  6. Graph Capture – Supports WebGPU graph capture for performance (with automatic fallback if unsupported).

  7. Performance – Device sharing and buffer reuse keep inference fast. Preallocated buffers eliminate allocation overhead.

  8. Debugging – Helper utilities (ONNXTestUtils, ONNXModelTester) simplify validation and profiling.

  9. Compatibility – Works for both static and dynamic inference with automatic metadata detection (capacity, color mode, precision).

  10. Resource Management – Proper cleanup via dispose() methods ensures GPU resources are released.