There is a shared-memory and serial with partition parallel bug when activating both TS_MPDATA and NS_PERIODIC. I have been looking for this bug for a while and I have not been able to find it and fix it. It is a very weird bug and it is only found in all the N-S periodic test cases distributed in ROMS. It displays different behaviors making it more difficult to track it. Each test case gives us a very different clue. The only thing that it is common is that the bug doesn't start right away, but appears after several timesteps which makes it much harder to find. I have never seen a parallel bug of this kind before so it is new territory for me. It is really weird to have parallel bugs associated with round-off. In my experience, parallel bugs always appear right away. All clues tell me that the problem is in mpdata_adiff.F. However, I am starting to suspect that it is somewhere else. I have made the backward parallel dependencies analysis for private and global arrays in this routine several times and re-written the ranges several times and I still get the parallel bug.

TS_MPDATA is still fine with any other type of boundary condition. If you are using TS_MPDATA and NS_PERIODIC, you need to run in serial with no partitions or distributed-memory (MPI). We will continue hunting for this elusive parallel bug

By the way, TS_MPDATA and EW_PERIODIC is fine for any partition with shared-memory, distributed-memory, and serial with partitions.

Who is online

Users browsing this forum: No registered users and 1 guest

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum