VISTA3D — Multi-class Worker (v3)

Full-body CT multi-class segmentation in the browser. Single sliding pass producing 118 class channels (MONAI EVERYTHING_PROMPT), via tile-based GPU accumulator. Encoder/decoder/post_mapping run once per patch, shared across all classes; only the final class-embedding matmul scales with N_cls. Runs in a Web Worker — UI stays live during the multi-minute sliding. · ← v2 (single-class)

✓ Node M1 verified (2026-04-24): 213×213×163 / 18 patches / 118 classes → 71.5 s total, 2 tiles × 1.75/1.74 GB accum. Liver voxels recall 99.23% vs single-class reference. Single-class rel_rms 4.4e-7 vs MONAI SlidingWindowInferer. All fp32, same precision as PyTorch MPS (~1e-6 floor).

⚠ Download: ~786 MB weights + 28 MB canonical CT.

✨ Dual memory mode — auto-picks per device:

Fast (smart-scatter): all N tile accumulators alive; each patch forward runs once. Peak ~8 GB at 133 classes × 213³; verified 71.5 s on current M1 Node WebGPU.
Safe (outer-loop): one tile's accum alive at a time; patches straddling tile boundaries run forward 2-3×. Peak ~5 GB; ~2.3× slower (~165 s estimated after BM48 optimization) but byte-identical labels to Fast.
Auto default: Apple UMA budget 10 GB, discrete 6 GB. Estimated smart-mode peak < budget → Fast, else Safe.

Real clinical CT (512×512×300 → canonical 239×239×200) pushes peak to ~10 GB — fits M1 Pro/Max UMA fast; M1 8 GB / 8 GB discrete should pick Safe. 2 GB integrated still requires fp16 (not yet).

Demo CT uses a Python-preprocessed canonical (bit-exact vs MONAI; shows "liver recall 99.23%" parity). Upload path runs our on-device preprocess (ScaleIntensityRange + Orient→RAS + Spacing→1.5mm iso, trilinear align_corners=False); diverges ~5% rel_rms from VISTA3D's training align_corners=True convention but fine for upload visualization.

VISTA3D — Multi-class Worker (v3)

Log

Axial slice viewer