This only needs to compute the top-left four blocks to achieve the result. We
are still slightly wasteful on those blocks where we need only partial results.
We are also a bit wasteful in that we still need to manipulate the dask-graph
with a million or so tasks in it. This can cause an interactive overhead of a
second or two.