To benchmark initial performance of the MPI only CABARET, the wall clock time to perform 276 time steps without any I/O was measured against the performance of the serial code to perform the same task. The test case used here is a 3D backward facing step geometry and the boundary conditions are for laminar flow, with Reynolds number=5000 and Mach number=0.1. The grid size is 105 hexahedral cells. Both codes were compiled and run on Phase 2a of HECToR (with each node a quad core 2.3GHz Opteron processor with 8GB of RAM) using the PGI 10.8.0 Fortran90 compiler. The time taken for the serial code was 360 seconds and the parallel code 7.8 seconds using 50 cores (13 nodes) and 2 seconds using 250 cores (63 nodes). This gives an effective parallel efficiency of 72% for 250 full populated nodes.