What is the difference between CUDA, PhysX and Compute Shader? In my mind I think of them as three different things, but then again it is propably 3 ways of running code on the same hardware, right? What I got so far is that CUDA and PhysX are products of nVidia. Compute Shader is a DirectX standard implemented on newer GPUs.

I got the CUDA basics down, kernel, mem allocations, copy array back and forth etc... Is PhysX then a wrapper ontop of CUDA? And how does the compute shader fit in to this?

Someone set me straight please ;) small pseudo code is much appreciated!

CUDA and Compute Shader are General Purpose GPU (GPGPU) solutions which allow you to use the graphics card for non-graphics jobs, you are correct in that CUDA is nvidia only and that Compute Shader is a part of DirectX. CUDA will only work on nvidia cards as far as I know, and Compute Shader will work on all DirectX 11 compliant GPUs (Compute Shader 5.x) and some DirectX 10 GPUs (Compute Shader 4.x)

PhysX is a physics engine developed originally by ageia and later acquired and maintained by nvidia and doesn't really have anything to do with GPGPU technologies. There is however an implementation of PhysX which can be run on the GPU with the use of CUDA, but this of course only works on nvidia cards.

There is also OpenCL as an alternative to CUDA if you want to run GPGPU jobs on non-nVidia (AMD, Intel). It's really similar to CUDA but is necessarily a more general-purpose API since it targets many more hardware platforms than CUDA.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

CUDA, D3D/DirectCompute and OpenCL are all "GPGPU" APIs that communicate with the drivers. Whatever happens on the other side of the driver is irrelevant, but yes, they all make use of the same hardware.

So CUDA and Compute Shader are two different things (hardware)? But both are GPGPU technologies?

No, and yes. They generally both run on the same kind of hardware in your computer, the GPU. But that is only half of the truth.

CUDA ("Compute Unified Device Architecture") is not really a GPU API. It is vNidia's API/architecture for running compute tasks (whatever they are) on nVidia devices (whatever they are). This includes GPUs, but it also includes e.g. non-GPU Tesla racks. CUDA is also the backbone of OpenCL on nVidia cards (OpenCL on nVidia is secretly transformed into CUDA by the compiler) much like Cg is the backbone for OpenGL shaders on nVidia.

OpenGL 4 compute shader functionality and its DirectX counterpart are likely also secretly transformed into CUDA on nVidia systems. This is not documented anywhere, but it's my bet that this is just what happens (it's at least plausible). These are a vendor-independent compute task (not graphics!) API built into the DirectX and OpenGL graphics APIs. They are not required to, but factually use the same hardware (shader units in the GPU) as the graphics pipeline.

Maybe CUDA uses the same SMs as Compute Shader but it is approched differently in software?

That's true, it uses the exact same shader units on the same GPU(s), just with a different API and language (though in reality it's still a bit more complicated, as the driver config panel lets you dedicate GPUs for graphics and for compute tasks, so if you have more than one card and dedicate one to compute tasks, it may be "generally the same" but "factually different ones").

OpenCL (first mentioned by Bacterius above) is a different beast insofar as it has a greatly different design philosophy. Incidentially, it runs on the GPU on your computer too, but that is not necessarily so.

OpenCL is a framework for running compute tasks on "some hardware". It is not exactly specified what that hardware is, or even whether it is homogenous. For example, it is in principle entirely allowable to have tasks run on the CPU, the GPU, or a special "accelerator card". Or, a combination of these, all at the same time.

So much for the theory. In practice, OpenCL runs on your graphics card if you have a nVidia or AMD/ATI graphics card in your computer, and on the CPU otherwise, if you're lucky (or, not at all). And if you want to share objects (e.g. images) between OpenCL and OpenGL without doing an explicit round-trip, you must create the CL context from a valid GL context, which necessarily means they run on the same GPU.

AMD has released a SSE-accelerated CPU implementation of OpenCL as part of their computing SDK some years ago, and Intel has been talking about it, though I've never seen it being real (did they release something in the mean time?). Those are in any case not normally present on an end-user computer, though.

PhysX, as already explained above, is a totally different thing. As the name gives away, it does "physics" (such as simulating rigid bodies), it does not do "general computation". Insofar it is something that's much more "high level".

So... Compute Shader is DirectX which any new nVidia, AMD or Intel GPU supports. CUDA is nVidia specific. But Compute Shader gets "compiled" to CUDA for nVidia GPUs. Does that mean the binary has different routines for different hardware?

I get the feeling Compute Shader is just a wrapper for all GPGPU technologies.....

So there is no point in learning CUDA for game programming? CUDA is hardware specific and excludes users with ATI/Intel hardware?

I wouldn't say "no point in learning CUDA". It is by far the most high-performance API available. If you are crunching a lot of numbers and need ultra-high performance, then you will certainly want to use CUDA. I don't have any concrete numbers at hand now, but I remember having heard about noticeable differences (something like 10-15%? don't nail me down on that) between CUDA and OpenCL on the same hardware, doing the same calculation.

If you want to crunch numbers and don't want to depend on one vendor, use OpenCL.

If you just want to add "some special calculation and simulation" to your graphics and you make DX11 hardware a minimum spec, you're good with using compute shader. Otherwise, use OpenCL, which works fine (with 99% of its features) on DX10 class hardware.
Or, just do the calculations in the vertex/pixel shader, if the structure of the data and the nature of the calculations allows for it. For most people, most of the time, that's just good enough, and it's the least painful.

If you just want to write a "game with some physics", you probably want to use an already functioning and tested physics engine, preferrably one that doesn't run on only one specific hardware vendor. Bullet is an example of such a thing.
Really, if this is your goal, use something that already works. It's by no means a trivial task to write your own physics engine.

Chances are as a game developer you will never have to touch either CUDA or OpenCL in your life.

PhysX is just NVidias physics engine. Usually it is run on the CPU unless a CUDA capable product is present (NVidia GPU's, Aeigx PPU's [later manufactured by NVidia] and the NVidia Tesla lineup). The bullet physics library can also do this with both CUDA products and OpenCL products (Again, all NVidia products with CUDA are supported under OpenCL although I think bullet puts priority on CUDA on NVidia hardware).

CUDA and OpenCL themselves are just libraries for running parallel code and maths. It just happens to be that a graphics card is the perfect environment for this although there are other options (physics processors, rare now but they are also perfect, thats just one alternate). OpenCL can run on a much broader range of hardware than CUDA but CUDA can be faster (some things OpenCL still manages to do quicker though). OpenCL can also software emulate most of the functionality it would use a GPU for but I dont think its advisable to do so.

Compute shaders I don't know much about but I believe they are just used for programming extra graphics effects in.