Compute Power

Compute power in the cloud

Introduction

This page describes available processing power options on the public cloud. The basic rate per hour is
taken to be on demand meaning that once you secure the machine you keep it for as long as you like.
There are also pre-emptible instances which can be taken out of your control on short notice. These are
available in lower quantity and at considerably reduced cost (40% to 20% or less of the on-demand rate typically.)

Links

Basic Thesis

On cost: The more powerful the cloud VM the more it costs per hour; and obviously it will complete a given task
more quickly; so there is the potential to benchmark different instance types to optimize. To first order
however one can take them to be cost-equivalent and simply work empirically by timing your compute tasks.

On limits and runaway cost: Contact your cloud vendor and request a limit increase if you are unable to
get the number of machines at once that you need. A limit is initially in place
to prevent you from running up a huge bill accidentally via a typo in a configuration file. If for example
your script requests 20 machines but you type ‘200’ you could find yourself spending thousands of dollars
per hour. Another good way to incur these kinds of accidental charges is to allow your access keys to
wind up on GitHub. So there are some pitfalls in using the cloud without knowing what you are doing;
and there is consequently a learning curve. The first rule is ‘always test at small scale before scaling
up’. The second rule is ‘know how to operate without putting your account access at risk of theft.’

GPU-based cloud instances

A comparison of $/GPU/hour on V100s gives preemptible rates of .93/.61/.84 dollars per GPU per hour
for AWS, Azure and Google respectively. The on-demand rates are respectively 3.06/3.06/2.95 dollars per GPU per hour.
These feature the current generation: NVIDIA Tesla V100 GPUs. Prior-generation GPUs (P100, P4, K80, M60) are also available
at commensurately lower rates. These data are subject to change.

Note that data for Tensorflow Processing Units (TPUs) are still pending; available only on Google Cloud Platform.