Ultrasound simulation is a critical component of model-based treatment planning for ultrasound therapy. However, the domains are typically thousands of wavelengths in size, leading to large-scale numerical models with 10s of billions of unknowns. This paper presents a novel local Fourier basis domain decomposition for full-wave ultrasound propagation simulations with a custom bell function which ensures that the numerical error stays below 0.1% while enabling almost ideal strong scaling. Realistic benchmarks with 512 Nvidia P100 GPUs in the best EU supercomputer Piz Daint achieved efficiency between 90 and 100% with a speed-up over 100 and computational cost reduction by a factor of 12 compared to 1024 Haswell cores.