Full-body CT multi-class segmentation in the browser.
Single sliding pass producing 118 class channels
(MONAI EVERYTHING_PROMPT), via tile-based GPU accumulator.
Encoder/decoder/post_mapping run once per patch, shared across all
classes; only the final class-embedding matmul scales with N_cls.
Runs in a Web Worker — UI stays live during the multi-minute sliding.
·
← v2 (single-class)
✓ Node M1 verified (2026-04-24):
213×213×163 / 18 patches / 118 classes → 71.5 s total,
2 tiles × 1.75/1.74 GB accum.
Liver voxels recall 99.23% vs single-class reference.
Single-class rel_rms 4.4e-7 vs MONAI SlidingWindowInferer.
All fp32, same precision as PyTorch MPS (~1e-6 floor).
⚠ Download: ~786 MB weights + 28 MB canonical CT.
✨ Dual memory mode — auto-picks per device:
Fast (smart-scatter): all N tile accumulators alive; each
patch forward runs once. Peak ~8 GB at 133 classes × 213³;
verified 71.5 s on current M1 Node WebGPU.
Safe (outer-loop): one tile's accum alive at a time; patches
straddling tile boundaries run forward 2-3×. Peak ~5 GB; ~2.3× slower
(~165 s estimated after BM48 optimization) but byte-identical labels to Fast.
Auto default: Apple UMA budget 10 GB, discrete 6 GB.
Estimated smart-mode peak < budget → Fast, else Safe.
Real clinical CT (512×512×300 → canonical 239×239×200)
pushes peak to ~10 GB — fits M1 Pro/Max UMA fast; M1 8 GB / 8 GB discrete
should pick Safe. 2 GB integrated still requires fp16 (not yet).
Demo CT uses a Python-preprocessed canonical (bit-exact vs MONAI; shows "liver recall 99.23%" parity).
Upload path runs our on-device preprocess (ScaleIntensityRange + Orient→RAS + Spacing→1.5mm iso, trilinear align_corners=False);
diverges ~5% rel_rms from VISTA3D's training align_corners=True convention but fine for upload visualization.