We present algorithms for sorting and routing on two-dimensional mesh-connected parallel architectures that are optimal on average. If one processor has many packets then we asymptotically halve the up to now best running times. For a load of one optimal algorithms are known for the mesh. We improve this to a load of eight without increasing the running time. For tori no optimal algorithms were known even for a load of one. Our algorithm is optimal for every load. Other architectures we… CONTINUE READING