I'm trying to help my fellow scientist to glue together all the different bits & bobs, myself am not a programmer and this is why I don't quite get it.It is simple for you I believe - since ACML (5.3.1) advertises itself as full implementation of BLAS and LAPACK - why do I still need cblas?

I mean, does it not defeat the purpose of best possible optimisation in this case with ACML for AMD hardware? can someone shed a bit of light on how it works under the hood please?

BLAS generally means the routines written in Fortran. CBLAS is simply a wrapper around these routines, providing a C interface to call them. The CBLAS performance should be virtually identical to the Fortran BLAS performance. In most cases, MAGMA actually directly calls the Fortran 77 BLAS interface, but for a few routines (such as zdot), it is not well defined how to call the Fortran interface. Because Fortran doesn't mandate how function values are returned, so there are several different conventions. For these, we use CBLAS as it is more portable. These are not performance-critical routines.