We present a new parallel algorithm for collision detection using many-core computing platforms of CPUs or GPUs. Based on the notion of a p-partition front, our algorithm is able to evenly partition and distribute the workload of BVH traversal among multiple processing cores without the need for dynamic balancing, while minimizing the memory overhead inherent to state-of-the-art parallel collision detection algorithms. We demonstrate the scalability of our algorithm on different benchmarking scenarios with and without using temporal coherence, including dynamic simulation of rigid bodies, cloth simulation and random collision courses. In these experiments, we observe nearly linear performance improvement in terms of the number of processing cores on the CPUs and GPUs.