Differential Power Analysis (DPA) attacks allows discovering the secret key stored into secure embedded systems by exploiting the correlation between the power consumed by a device and the data being processed. The computation involved is generally relatively simple, however, if the used power traces are composed by a large number of points, the processing time can be long. In this paper we aim at speeding up the so called correlation power analysis (CPA). To do so, we used the OpenCL framework to distribute the workload of the attack over an heterogeneous platform composed by a CPU and multiple accelerators. We concentrate in the computation of the Pearson's correlation coefficients, as they cover approximately 80% of the overall execution time, and we further optimize the attack by minimizing the data transfers between the host processor and the GPUs. Our results show performance improvements of up to 9x when compared with the reference parallel implementation