Skip to content

Sorting Module API Reference

This reference covers the public surface exported from src/sort/. It mirrors the TypeScript implementation of the GPU radix sorter (GPURSSorter) and the interfaces shared with the renderer and preprocessor.

Exports

// src/sort/index.ts
export interface ISorter { ... }
export interface SortedSplats { ... }
export { GPURSSorter, HISTOGRAM_WG_SIZE, RS_HISTOGRAM_BLOCK_ROWS } from './radix_sort';
export type { PointCloudSortStuff } from './radix_sort';

Interfaces

ISorter

Contract for sorter implementations.

interface ISorter {
  createSortStuff(device: GPUDevice, numPoints: number): SortedSplats;
  recordSort(sortStuff: SortedSplats, numPoints: number, encoder: GPUCommandEncoder): void;
  recordSortIndirect?(sortStuff: SortedSplats, dispatchBuffer: GPUBuffer, encoder: GPUCommandEncoder): void;
}

Most callers interact with the concrete GPURSSorter, but the renderer stores sorters through this interface so alternative implementations can be swapped in for testing.

SortedSplats

interface SortedSplats {
  numPoints: number;
  sortedIndices: GPUBuffer;
  indirectBuffer: GPUBuffer;
  visibleCount?: number;
  [key: string]: any; // implementation specific context
}

PointCloudSortStuff extends this interface with the additional buffers required by GPURSSorter.

PointCloudSortStuff

interface PointCloudSortStuff extends SortedSplats {
  num_points: number;      // alias for compatibility
  sorter_uni: GPUBuffer;   // GeneralInfo storage buffer
  sorter_dis: GPUBuffer;   // Indirect dispatch buffer
  sorter_bg: GPUBindGroup; // Radix pipelines
  sorter_bg_pre: GPUBindGroup; // Preprocess group
  sorter_render_bg: GPUBindGroup; // Renderer group
  internal_mem: GPUBuffer;
  key_a: GPUBuffer;
  key_b: GPUBuffer;
  payload_a: GPUBuffer;
  payload_b: GPUBuffer;
}

GPURSSorter class

class GPURSSorter implements ISorter {
  static async create(device: GPUDevice, queue: GPUQueue): Promise<GPURSSorter>;
  createSortStuff(device: GPUDevice, numPoints: number): PointCloudSortStuff;
  recordSort(sortStuff: SortedSplats, numPoints: number, encoder: GPUCommandEncoder): void;
  recordSortIndirect(sortStuff: SortedSplats, dispatchBuffer: GPUBuffer, encoder: GPUCommandEncoder): void;
  recordResetIndirectBuffer(indirectBuffer: GPUBuffer, uniformBuffer: GPUBuffer, queue: GPUQueue): void;
  static createRenderBindGroupLayout(device: GPUDevice): GPUBindGroupLayout;
  static createPreprocessBindGroupLayout(device: GPUDevice): GPUBindGroupLayout;
}

GPURSSorter.create(device, queue)

  • Async factory that probes several subgroup sizes (16, 32, 16, 8, 1).
  • Builds all compute pipelines (zero, histogram, prefix, scatter_even, scatter_odd) for each candidate.
  • Runs testSort (sorting 8,192 floats) using recordSort to ensure the configuration works on the current adapter.
  • Returns a configured sorter or throws if no configuration succeeds.

createSortStuff(device, numPoints)

  • Allocates key/payload ping pong buffers, the internal scratch buffer, and the GeneralInfo + indirect buffers sized for numPoints (rounded up to 3840 aligned blocks).
  • Builds three bind groups: sorter_bg (radix passes), sorter_bg_pre (preprocessor), and sorter_render_bg (renderer).
  • Returns a PointCloudSortStuff instance that can be cached per point cloud.

recordSort(sortStuff, numPoints, encoder)

  • Records zero -> histogram -> prefix -> scatter with explicit workgroup counts derived from numPoints.
  • Used mainly by the self test or simple paths with a known number of keys.

recordSortIndirect(sortStuff, dispatchBuffer, encoder)

  • Same passes as above but dispatches zero, histogram, and both scatter entry points indirectly using the dispatchBuffer (usually sorter_dis).
  • Calls recordPrefixHistogram directly because prefix always uses a single workgroup count.
  • Used by the renderer; preprocessing is responsible for writing the final value of dispatch_x before sorting begins.

recordResetIndirectBuffer(indirectBuffer, uniformBuffer, queue)

  • Writes zero into both the indirect dispatch buffer and the first dword of GeneralInfo (keys_size).
  • Invoked before preprocessing so the subsequent atomic increments start from a known state.

Static layout helpers

  • createRenderBindGroupLayout exposes sorter_uni and payload_a to vertex/compute stages (@group(1) in the renderer).
  • createPreprocessBindGroupLayout exposes sorter_uni, key_a, payload_a, and sorter_dis to preprocessing (@group(2)).

Supporting structs

GeneralInfo

interface GeneralInfo {
  keys_size: number;   // number of visible splats written by preprocessing
  padded_size: number; // key count rounded up to 3840 multiples
  passes: number;      // always 4 for 32 bit keys
  even_pass: number;   // bit mask for passes handled by scatter_even
  odd_pass: number;    // bit mask for passes handled by scatter_odd
}

Stored inside sorter_uni as a storage buffer. The renderer copies keys_size into the draw indirect buffer after sorting.

IndirectDispatch

interface IndirectDispatch {
  dispatch_x: number;
  dispatch_y: number;
  dispatch_z: number;
}

Preprocessing increments dispatch_x once per 256 * 15 splats. The sorter uses this structure for dispatchWorkgroupsIndirect calls and the renderer later shares the same buffer when issuing indirect draws.

Example

const sorter = await GPURSSorter.create(device, device.queue);
const sortStuff = sorter.createSortStuff(device, totalPoints);

// Reset counters before preprocessing
sorter.recordResetIndirectBuffer(sortStuff.sorter_dis, sortStuff.sorter_uni, device.queue);

// ... preprocessing writes depth keys into sortStuff.key_a / payload_a ...

const encoder = device.createCommandEncoder();
sorter.recordSortIndirect(sortStuff, sortStuff.sorter_dis, encoder);
device.queue.submit([encoder.finish()]);

// Use in render pass
pass.setBindGroup(1, sortStuff.sorter_render_bg);
pass.drawIndirect(drawIndirectBuffer, 0);