As of today, Amazon EC2
is providing what they call "Cluster GPU Instances": An instance in
the Amazon cloud that provides you with the power of two NVIDIA Tesla
“Fermi” M2050 GPUs. The exact specifications look like this:

For some folks, it's possible those might make sense. However, at$2.10 an hour -- working out to nearly $20k per year -- it isextraordinarily expensive.

I wanted to try a benchmark on the machine. Alas, right now, the onlyAMI available is running CentOS (with which I am unfamiliar), andafter mucking with it for an hour, I gave up on trying to get all thetheano dependencies installed, so I have no numbers to share yet. I'lltry again later when there are more OS options.

It would be great to build a public theano AMI, so that someonecurious about theano could boot it up and having a functioning machineto play with immediately. (It also makes for great installationinstructions: Boot machine, use.) Again, if more OS options show up atsome point in the future, I'll put one together.

In the meantime, I'd be curious to hear performance numbers if someoneelse gets it up and running.

> I wanted to try a benchmark on the machine. Alas, right now, the only> AMI available is running CentOS (with which I am unfamiliar), and> after mucking with it for an hour, I gave up on trying to get all the> theano dependencies installed, so I have no numbers to share yet. I'll> try again later when there are more OS options.

Other AMIs are available, some with Ubuntu and atlas-enabled numpy,scipy, etc. For example: starcluster [1] provides all that, inaddition to automatically configuring MPI, SGE, having load-balancingsupport to add/remove notes to the queue, and plans to includePyCUDA+PyOpenCL [2], ipython's parallel processing, hadoop, etc.

> It would be great to build a public theano AMI, so that someone> curious about theano could boot it up and having a functioning machine> to play with immediately. (It also makes for great installation> instructions: Boot machine, use.) Again, if more OS options show up at> some point in the future, I'll put one together.

This would be awesome! Maybe forking a specific starcluster AMI wouldmake sense or and we could ask Justin Riley to include it by defaultor as a plug-in [3,4].

>> I wanted to try a benchmark on the machine. Alas, right now, the only>> AMI available is running CentOS (with which I am unfamiliar), and>> after mucking with it for an hour, I gave up on trying to get all the>> theano dependencies installed, so I have no numbers to share yet. I'll>> try again later when there are more OS options.>> Other AMIs are available, some with Ubuntu and atlas-enabled numpy,> scipy, etc. For example: starcluster [1] provides all that, in> addition to automatically configuring MPI, SGE, having load-balancing> support to add/remove notes to the queue, and plans to include> PyCUDA+PyOpenCL [2], ipython's parallel processing, hadoop, etc.

As of right now, starcluster's amis don't appear to allow you tolaunch a gpu instance (I tried ami-0af31963, which is their ami forubuntu 10.04). I believe it takes extra work to make an ami compatiblewith a new instance type, and it just hasn't been done yet...and I'mnot the right person to take it on. :)

>> It would be great to build a public theano AMI, so that someone>> curious about theano could boot it up and having a functioning machine>> to play with immediately. (It also makes for great installation>> instructions: Boot machine, use.) Again, if more OS options show up at>> some point in the future, I'll put one together.>> This would be awesome! Maybe forking a specific starcluster AMI would> make sense or and we could ask Justin Riley to include it by default> or as a plug-in [3,4].

Starcluster looks like an excellent base ami to build a theano ami ontop of. And from a very cursory inspection, theano looks like it wouldfit excellently as a starcluster plugin. Thanks for pointing theseout.

This AMI is currently not compatible with StarCluster 0.91.2, however,
if you just want to play around with the new GPU instances you're
probably better off launching a single instance from the AWS
management console. If you need a GPU cluster the latest github code
does work with this new AMI and instance type (both cg1.4xlarge and
cc1.4xlarge) if you're interested in testing.

A few notes:

1. CUDA is installed in /usr/local/cuda
2. MAGMA library is installed in /usr/local/magma
3. Custom python2.6 installation in /usr/lib64/python2.6/site-
packages
4. NumPy/SciPy/PyCuda/OpenCL/etc are installed in the custom
python2.6 installation
5. All software sources used are in /usr/local/src (look here for
PyCuda/PyOpenCL/MAGMA examples, etc)

Let me know if you have issues...

~Justin

On Nov 16, 12:46 pm, Josh Bleecher Snyder <joshar...@gmail.com> wrote:
> >> I wanted to try a benchmark on the machine. Alas, right now, the only
> >> AMI available is running CentOS (with which I am unfamiliar), and
> >> after mucking with it for an hour, I gave up on trying to get all the
> >> theano dependencies installed, so I have no numbers to share yet. I'll
> >> try again later when there are more OS options.
>
> > Other AMIs are available, some with Ubuntu and atlas-enabled numpy,
> > scipy, etc. For example:starcluster[1] provides all that, in
> > addition to automatically configuring MPI, SGE, having load-balancing
> > support to add/remove notes to the queue, and plans to include
> > PyCUDA+PyOpenCL [2], ipython's parallel processing, hadoop, etc.
>

> As of right now,starcluster'samis don't appear to allow you to

> launch a gpu instance (I tried ami-0af31963, which is their ami for
> ubuntu 10.04). I believe it takes extra work to make an ami compatible
> with a new instance type, and it just hasn't been done yet...and I'm
> not the right person to take it on. :)
>
> >> It would be great to build a public theano AMI, so that someone
> >> curious about theano could boot it up and having a functioning machine
> >> to play with immediately. (It also makes for great installation
> >> instructions: Boot machine, use.) Again, if more OS options show up at
> >> some point in the future, I'll put one together.
>

> > This would be awesome! Maybe forking a specificstarclusterAMI would

> > make sense or and we could ask Justin Riley to include it by default
> > or as a plug-in [3,4].
>

> Starclusterlooks like an excellent base ami to build a theano ami on

> top of. And from a very cursory inspection, theano looks like it would

> fit excellently as astarclusterplugin. Thanks for pointing these
> out.
>
> I'll plan on checking back onstarclusteroccasionally to find out

> The gpu on the computer are put in mitch mode? ECC to memory?> Exclusive mode to be sure nothing else run on it?

I used the cluster instance as I found it; not sure what the defaultconfig is. If there was something else running on those GPUs, itwasn't me that started it. I guess it is possible that Amazon issharing one physical GPU across multiple instances.

> HostFromGpu go from 0.379s to 14.303s (slowdown by 37x!)

I'd guess this has to do with their virtualization layer.

> GpuDot22 1.851s to 9.175s (slowdown by 5x!)> GpuCrossentropySoftmaxArgmax1HotWithBias 0.544s to 11.821s (22x slowdown!)>> Such slowdown mean it is worthless to use them with theano if we don't> fix this problem.

Or if they don't, depending on the root cause. :) Unfortunately, at >$2 per hour, it strikes me as possibly a bit too expensive to leave amachine running just to try to optimize code for it.

> Are you sure your code on the GTX480 is clean? The profiles you show> don't have the same node on them.

I'm pretty sure it was clean. One difference is that the clusterinstance was running the 0.3.0 release, whereas I run theano tip on mymachine.

I would be willing to spend some of my time running benchmarks on aCluster GPU instance, if it would be of any help. Maybe there is someconfiguration that helps Theano perform on these machines that we canfigure out and document.

Since the billing gets rounded up to the closest hour, it would beefficient if we made a list of benchmarks/tests to try beforeprovisioning.

On Dec 22 2010, 1:41 pm, Frédéric Bastien <no...@nouiz.org> wrote:> could we benchmark with pycuda? Did you tried it? Or maybe nvidia own> sdk sample as the one that check the memory bandwidth?>> Fred>> On Wed, Dec 22, 2010 at 1:40 PM, James Bergstra>>>>>>>

> <james.bergs...@gmail.com> wrote:> > No I don't have anything else on hand.>> > I guess that's the beauty of theano - usually you don't have to worry about> > what you'd do when it's not there to help :)>

> > On Wed, Dec 22, 2010 at 1:19 PM, Josh Bleecher Snyder <joshar...@gmail.com>> > wrote:>> >> > Have you done any benchmarking of othergpucode on the AMI ? If other

> >> > libs> >> > are running at full speed then it might be something about what Theano> >> > is> >> > doing.>> >> I didn't; do you have suggestions for other good benchmarks to run?

Why do you bring this up again now? This thread was related to diagnosing a problem which I think has disappeared. I think EC2 support for GPUs has been improved within the last few months, so that GPU performance is similar to what you'd get natively.

I ran the logistic_sgd.py program on a Cluster GPU instance yesterdayand I got performance worse than Josh Bleecher Snyder's "6.8s on [his]machine (GTX480/i7)". I don't remember exactly, but it was around60-70 seconds. Better than his earlier result of 113.7s on the EC2unit, but might warrant further investigation.

Unfortunately, I don't currently have a GPU card to test against,which is why I am doing experiments using these machines.

If the expectation that EC2 Cluster GPU instances should be as fast asnative, then maybe I should just run the benchmarks listed and reportback.

Interesting - the program that gave me hope was bound by GPU convolutions rather than anything else, such as host-device transfers. I should back off and say that *some* improvement was evident, but it's still certainly worth benchmarking various aspects of the EC2 GPU platform.