> The Alltoall should only return when all data is sent and received on
> the current rank, so there shouldn't be any race condition.

Your right this is MPI not pthreads. That should never happen. Duh!

> I think the issue is with the way you define the send and receive
> buffer in the MPI_Alltoall. You have to keep in mind that the
> all-to-all pattern will overwrite the entire data in the receive
> buffer. Thus, starting from a relative displacement in the data (in
> this case matrix[wrank*wrows]), begs for troubles, as you will write
> outside the receive buffer.

The submatrix corresponding to matrix[wrank*wrows][0] to
matrix[(wrank+1)*wrows-1][:] is valid only on the wrank process. This
is a block distribution of the rows like what MPI_Scatter would
produce. As wrows is equal to N (matrix width/height) divided by
wsize, the number of mpi_all_t blocks in each message is equal to
wsize. Therefore, there should be no writing outside the bounds of
the submatrix.

On another note,
I just ported the example to use dynamic memory and now I'm getting
segfaults when I call MPI_Finalize(). Any idea what in the code could
have caused this?