Here it is, CUDA 4.0 RC just got released to NVIDIA Registered developers.

Interesting stuff from the CUDA manual:

Layered Textures Support (GL_TEXTURE_1D/2D_ARRAY) : New tex.a1d/.a2d modifiers in PTX. But unfortunately the surface instruction do not support them yet, Grrrr
Layered textures are created using cudaMalloc3DArray() with the cudaArrayLayered flag. New cudaTextureType2DLayered/ cudaTextureType2DLayered texture sampler types and tex1DLayered()/tex2DLayered() access intrinsics.

New .address_size PTX specifier : Allows to specify the address size (32b/64b) used throughout a PTX module.

Inline PTX assembly: This feature was already present since CUDA 2.x but was not officially supported. It's now fully supported and documented :-D