Whole-brain 133-class segmentation with batched-fast tiling:
N tile slots alive simultaneously, each forward scatters to all
overlapping tiles in one pass. Worker auto-picks N based on the GPU
budget (M1 16GB UMA → N=8, ~210 s vs v2's 607 s — Node-verified at
byte-exact 99.9999%).
·
← v2 (outer-loop tiling)
·
v1 (main-thread)
⚠ Download size: ~340 MB of UNesT weights will be fetched below.
Memory: N=8 batched-fast peaks at ~12 GB GPU buffers — needs
Apple Silicon 16 GB+ UMA (M1/M2/M3) or discrete GPU 16 GB+. Other
platforms auto-downgrade to N=2/4 (or N=1 = v2 outer-loop fallback).