ImageNet 1000-class Vision Transformer (B/16: patch=16, hidden=768,
12 blocks, 12 heads). Input is the precomputed x.bin
(224×224 RGB, ImageNet-normalized). All compute (WebGPU + WASM) runs in
a dedicated Web Worker so the page stays responsive during model init.