On machine 2, I have also tried to run the same test, but build from PAPI-4.2 using cuda toolkit 4.0, and it fails the same way; so I wonder if it could be related to Nvidia driver ?I would like to know what are the configurations (Nvidia driver, etc) that are known to make PAPI-CVS + CUDA 4.1 working together.

Note that you are not using the CUDA component on either machine. In order to use the CUDA component for the HelloWorld! example, you will need to replace the event name PAPI_FP_OPS with a native CUDA event name for your particular CUDA device. Run papi_native_avail to see the list of events that's available for the CUDA device on your machine. I would tupically use a CUDA event that counts the number of instructions executed (the name should be something like e.g. CUDA.Tesla_S2050.domain_a.inst_executed).

See comment in HelloWorld.cu: /* REPLACE THE EVENT NAME 'PAPI_FP_OPS' WITH A CUDA EVENT FOR THE CUDA DEVICE YOU ARE RUNNING ON. RUN papi_native_avail to get a list of CUDA events that are supported on your machine */

Also, what does configure print when you configure the CUDA component on your machine 2? Can you please provide the output?

I would also need the configure output of PAPI in order to find out what's going on.