Update 21 May 2013: See the comments below this post. This approach most likely works -- what has been confusing me is the lack of reports of GPU timings in the output, but this doesn't necessarily mean that the GPU isn't being used. The poster below, using nvidia-smi, observed GPU usage, although the speed-up was not major.Blogspot needs versioning.
I lost the entire post when it was almost complete. Screw this.

Everything compiles fine, but no GPU output during calculation.

I see no evidence of the GPU being used at any stage. Otherwise all is good -- the calcs run fine on the CPU.

Where do you want to install ACML? Press return to use
the default location (/opt/acml5.3.1), or enter an alternative path.
The directory will be created if it does not already exist.
> /opt/acml/acml5.3.1

sh install-acml-5-3-1-gfortran-64bit.sh

Where do you want to install ACML? Press return to use
the default location (/opt/acml5.3.1), or enter an alternative path.
The directory will be created if it does not already exist.
> /opt/acml/acml5.3.1

Make sure to edit the Version field since Patch-1 leads to an error (must start with digit).
LIBCCHEM
Edit /opt/gamess_cuda/libcchem/src/externals/boost/cuda/device_ptr.hpp and /opt/gamess_cuda/libcchem/rysq/src/externals/boost/cuda/device_ptr.hpp. Insert

Update 21 May 2013: See the comments below this post. This approach most likely works -- what has been confusing me is the lack of reports of GPU timings in the output, but this doesn't necessarily mean that the GPU isn't being used. The poster below this post, using nvidia-smi, observed GPU usage, although the speed-up was not major.

Make sure to edit the Version field since Patch-1 leads to an error (must start with digit).
Openmpi 1.6
Can't remember why I ended up compiling it myself instead of using the stock debian version. From here.

please enter your target machine name: linux64
GAMESS directory? [/opt/gamess_cuda] /opt/gamess_cuda
Setting up GAMESS compile and link for GMS_TARGET=linux64
GAMESS software is located at GMS_PATH=/opt/gamess_cuda
Please provide the name of the build locaation.
This may be the same location as the GAMESS directory.
GAMESS build directory? [/home/me/tmp/gamess]
Please provide a version number for the GAMESS executable.
This will be used as the middle part of the binary's name,
for example: gamess.00.x
Version? [00] 12r2
Please enter your choice of FORTRAN: gfortran
gfortran is very robust, so this is a wise choice.
Please type 'gfortran -dumpversion' or else 'gfortran -v' to
detect the version number of your gfortran.
This reply should be a string with at least two decimal points,
such as 4.1.2 or 4.6.1, or maybe even 4.4.2-12.
The reply may be labeled as a 'gcc' version,
but it is really your gfortran version.
Please enter only the first decimal place, such as 4.1 or 4.6:
4.6

Enter your choice of 'mkl' or 'atlas' or 'acml' or 'none': atlas
Please enter the Atlas subdirectory on your system: /opt/ATLAS/lib
Math library 'atlas' will be taken from /opt/ATLAS
If you have an expensive but fast network like Infiniband (IB), and
if you have an MPI library correctly installed,
choose 'mpi'.
communication library ('sockets' or 'mpi')? mpi
Enter MPI library (impi, mvapich2, mpt, sockets): openmpi

Please enter your openmpi's location: /opt/openmpi/1.6

Build Gamess US

cd /opt/gamess_cuda/ddi/
./compddi
cd ../

Edit comp

872 # see ~/gamess/libcchem/aaa.readme.1st for more information
873 set GPUCODE=true
874 if ($GPUCODE == true) then

26 April 2013

NOTE: with ACML my performance on my FX8150 and FX8350 nodes is only 25% of that with Openblas (double precision). Yes, for some reason gromacs is four times faster with openblas than with the machine vendor libraries in my tests.

Note that GPU calcs only speed things up under certain, specific conditions -- and not all nvidia cards are supported (or equal). My own set-up, using statically cooled graphics cards, is definitely not appropriate for a GPU cluster. Once nwchem comes out with GPU support I might upgrade to fancier $200 graphics cards (maybe COSMO in NWChem will finally become more reasonable in terms of computational cost), but there's little reason for that at the moment.

Not all cards are created equal either -- e.g. GT210, which has GPU compute capability 1.2, is too poor to run with gromacs. GT430 (compute cap GT430) works. Both are obviously not viable for professional work.

Also note that it seems that you still need to use OPENMM if you want GPU support for implicit solvation.

CUDA: If you want to build with cuda you need gcc-4.6, which is still available in the wheezy repos. 4.7 won't work. Luckily, you can have both on your system, but you'll need to specify CC and CXX as shown below.Openblas
Note that the links to the openblas file tends to die after a while, so you might have to download it manually.

to your ~/.bashrc
[for later use with nwchem and ecce, add /opt/openblas/lib to /etc/ld.so.conf and do sudo ldconfig -- you might want to make libopenblas.so and libopenblas.so.0 sym links to the main lib, libopenblas_bulldozer-r0.2.6.so]
single-precision gromacs 4.6 with both CPU and GPU

CUDA
If you have an nvidia card and want to enable GPU calcs, do

sudo apt-get install nvidia-cuda-toolkit gcc-4.6 g++-4.6

If/usr/lib/libcuda.so is nothing by a symmlink to /usr/lib/libcuda.so.1, and the file /usr/lib/libcuda.so.1 is missing (this was the case on my wheezy amd64), then do

Note also that the ns/day values depended highly on how long I let the calc run, and as I didn't time it and make them run the same amount of time, I suspect that auto, GPU and gpu_cpu are all about the same.