Contrast-agnostic whole-brain segmentation (Iglesias group, MGH; Apache-2.0).
Single 5-level 3D U-Net (13.24 M params) → softmax → GaussianBlur(σ=0.5)
→ argmax. WebGPU + WASM forward, all post-network ops in SIMD128 host
helpers. Verified 100.0000 % bit-exact against ORT-CPU 256³ on
real T1.
✓ Direct 256³ inference, no tiling.
The fused synthseg_skip_up_conv3d.wgsl kernel eliminates
dec lvl 3's 4.83 GB cat buffer (would exceed WebGPU's 4 GB single-buffer
cap) by reading skip_0 and dec2_bn directly with
on-the-fly nearest 2× upsample.
⚠ Memory: the full 256³ × 33-class logits accumulator is
2.21 GB fp32. Apple Silicon UMA (16 GB+) is comfortable; on a
< 4 GB-tab discrete-GPU machine this may OOM. Forward stays on the
GPU; readback is auto-chunked into 3 × ≤ 0.7 GB MAP_READ slabs to dodge
the Dawn / wasm_webgpu > 2 GB binding limit.