"best way" is likely whatever you are famiilar with. Any performance event profiling tool which works with C99 or C++ gcc or icc should be suitable, if it's not tied to an incompatible threading model.

I have used perf a few times to collect some simple aggregate statistics for Cilk Plus programs. I believe the Cilk Plus runtime should look like just another shared library to perf, and it should mostly "just work" like profiling any other pthreaded program. But I don't have much experience using perf myself, so it is possible there are more subtle problems that I'm not aware of?