Abstract

We implement a 3D Electrostatic Particle in Cell (PIC) code on the HP-Convex Exemplar SPP1000. Our principal goals are to identify the best PIC algorithm for this architecture and capture its performance, and to explore whether the architecture and system software achieve an efficient and scalable shared memory programming environment. We show that PIC codes can achieve good performance on the Exemplar. However to achieve this performance great care is required in minimizing long latencies to remote memory and in maximizing cache reuse. Combined, these two requirements for avoiding performance degradation due to latency resulted in a complex programming task and diminished significantly many advantages that the shared memory hardware provided towards ease-of-use. Our best performing code avoided stressing the cache coherency hardware. Best performance was achieved by storing the particle data in 'processor local' memory blocks, and by intermittent sorting of the particle data to improve processor cache reuse.