CUDA Toolkit 6.5 was released before sm_52 architecture came into production. After the arrival of sm_52 architecture, an update to CUDA 6.5 was released which enabled nvcc to generate code for sm_52. Make sure you download the newer version of CUDA Toolkit 6.5. P.S: I would rather use the latest...

Currently, using the packaged nvidia-supplied toolchains, if you write CUDA code in C, the device compiler (nvcc) will be required at some point to, at a minimum, convert this C source code to valid PTX. After that point, the toolkit (which includes nvcc) is not absolutely necessary. PTX code can...

The primary difference is that in your CUDA case, you are statically linking to libcudart, the cuda runtime library, which adds ~500K minimum to the executable size. The openCL executable is dynamically linked to libOpenCL.so, which means the size of that library does not contribute to the size of the...

The PTX file format is intended to describe a virtual machine and instruction set architecture: PTX defines a virtual machine and ISA for general purpose parallel thread execution. PTX programs are translated at install time to the target hardware instruction set. The PTX-to-GPU translator and driver enable NVIDIA GPUs to...

CUB is an evolving library, which means that as new features are introduced in CUDA, CUB may evolve in newer releases to take advantage of those. If you then attempt to use a newer CUB release with an older CUDA version, you may run into compatibility issues. This usage by...

While Xcode has a plug-in mechanism, it is not publicly documented and it changes between point releases of Xcode (like from Xcode 5.0 to 5.1). You likely can't use it, and even if you figured it out, you don't want to use it. Instead, what you should do is add...

This is likely a bug in nvcc. After following @talonmies suggestion to look through nvcc.profile, I started trying combinations of profile settings and command line options. I narrowed it down to this: when --keep is on the command line AND compiler-bindir is in the nvcc.profile, the malformed cl.exe compile command...

For everyone out there using windows and trying to get CUDA and the intel compiler to co-operate, see my initial question on how I set up the solution. To get it to work, as per Roger Dahl's suggestion, I changed the CUDA project to a DLL. This involved the following...

On reason it generates the warning multiple times is because it is compiling your code multiple times due to the specification of multiple targets: -gencode=arch=compute_35,code=\"sm_35,compute_35\" and -arch sm_20 If you don't need both sets of targets, you can reduce the warning messages produced and shorten your compile time by deleting...

I'm copying, for reference, the salient points of the answer from the RootTalk forum that solved the problem: A key point is that the C interpreter of ROOT (CINT) requires a "CINT dictionary" for the externally compiled function. (There is no problem when compiling through ROOT, because ACLiC creates this...

Using the -G switch disables most compiler optimizations that nvcc might do in device code. The resulting code will often run slower than code that is not compiled with -G, for this reason. This is pretty easy to see by running your executable in each case through cuobjdump -sass myexecutable...

Note that in the entirety of your writeup, I do not see a question being explicitly asked, therefore I am responding to: I am looking forward to learning what is going on here. You have a race condition on d_u. by your own statement: •in order to keep the blocks...

This is a known problem in FFTW 3.3 whereby the FFTW headers misidentify that they are being compiled with a gcc version >=4.6 which has 128bit floating point support. It has been reported to effect compilation with icc, and it looks like nvcc steered compilation has the same problem. The...

Look at the makefile that comes with that cdpSimpleQuicksort project. It shows some additional switches that are needed to compile it, due to CUDA dynamic parallelism (which is essentially the second set of errors you are seeing.) Go back and study that makefile, and see if you can figure out...

Solution: 1) installing CUDAToolkit6.0 2) downgrade CUDAdriver to 4.2.10, 3) upgrade CUDAdriver to 6.5 ONLY!!!... I still get a warning nvcc warning : The 'compute_10' and 'sm_10' architectures are deprecated, and may be removed in a future release. I asked this on a nvidia tutorial and the answer was "Hardware...

Theano is designed to work (almost) identically on both CPU and GPU. You don't need a GPU to use Theano and if you don't have a Nvidia GPU then you shouldn't try installing any GPU-specific stuff at all.

Give nvcc the path to any include files needed. You do this in the same fashion that you would for gcc/g++. The only include files that you don't have to specify this for (with nvcc) are the default ones located in /usr/local/cuda/include So if, on your machine, helper_functions.h is located...