As a IS consulting firm, we would like to have our very own password cracking machine.
Great.

Now, after some sketching and brain storming we concluded that GPU is the best way to go (contrary to CPU or rainbow tables).
And the questions that we found hard to tackle was what is the best choice - can we buy just enough power from on-demand cloud service (amazon's is an immediate selection but if you have any recommendations, go ahead) or to buy ourselves a machine and build it with the appropriate hardware.
btw, we are thinking on using oclHashcat-plus if it makes any difference.

Suggestions on how to settle the debate?
Maybe some completely different approach?

7 Answers
7

At 16-core GPU maximum, I'd build my own, unless oclHashcat can distribute work loads (at first look it doesn't seem like it does). That is assuming this thing is going to pound passwords all day most days. If you can scale it more (or want to run a lot in parallel) or wont use it all day long, pay for it by the hour.

Now, after some sketching and brain storming we concluded that GPU is the best way to go (contrary to CPU or rainbow tables).

I'd argue that you should probably download and archive some rainbow tables anyway, though it's less likely they'll come in handy, a lot of poorly written software still runs silly stuff like MD5, which will crack a stupid amount of time faster on a rainbow table. Multi-pronged approach for multiple environments.

Barebones EC2 GPU cloud will run you about $7+k on metal, depending on RAM/CPU/Disk requirements:

This is a simple cost / benefit analysis, so it's a business question rather than a security one since you've already decided the merits of how to approach the security aspect. Write your code, benchmark it, and compare numbers in a spreadsheet.

For the metal

* Cost of metal
* Cost per hour in terms of electric bill (assume under full load)
* Budget for parts replacement
* Operations per second

For the EC2

* Amazon rate per hour
* Operations per second

Compare the numbers on a month by month or year by year basis. Pick the one that makes more sense based upon expected life. Front-loading cost, net present value, expected utilization, risk, etc are all accounting problems.

You should also consider the cost of unavailability due to EC2 failure. The probability of it happening again is small but not zero.
–
this.joshNov 17 '11 at 20:16

2

Running a password-cracking job for a client is something that can wait 6-18 hours for an AWS service to come back up.
–
yfeldblumNov 18 '11 at 2:34

@this.josh likewise the cost of replacing parts due to failure in the DIY model
–
lewApr 5 '12 at 8:22

@this.josh I think the cost of unavailability in a password cracker is pretty limited, and unavailability on Amazon is probably lower than on a self-constructed machine.
–
Jan SchejbalJan 2 '13 at 15:52

The upshot is this. Radeon GPUs are about 3 times faster than equivalent nVidia GPUs for password cracking. But, EC2 uses nVidia GPUs, because they are better at most other GPU tasks. Moreover, EC2 doesn't just use nVidia GPUs, but the expensive "Tesla" versions. You can buy a $250 Radeon GPU with the same password cracking performance of a $10,000 Tesla system. Whether your buy it or rent it from Amazon, it just doesn't make sense.

Because GPUs can make you password cracking 20 times faster than a normal CPU, there is a great benefit to using one of them. But, because there are decreasing marginal returns, there isn't a lot of benefit to investing a lot of money in GPU number crunching.

The upshot is that it's worthy spending $500 for two Radeon 6850 cards and sticking them in your desktop for use with oclhashcat, but not worth spending more.

You don't even need a supercomputer / cloud to check the hashes.
You could donate to the freerainbowtables.com project some hardware and request your custom rainbow tables. (they have over 2000 machines, in their distributed project that generate rainbow tables on the CPU as well as the GPU).

It is about risk management and cost/benefit analysis. The more risk that you are willing to take, the less it will cost you to run your application. There are some basic and somewhat obvious threats that could disrupt or compromise your application's operations:

Physical security risk

This refers to the physical security threats to your application's infrastructure, such as unauthorized physical access to hardware, natural disasters, power outages, network disruptions, terrorism, etc.

IT security risk

IT administrators require physical access to the hardware, as well virtual access the operating systems running on the hardware. IT administrators will require such access to servers, network equipment, power distribution circuits, monitoring software, etc. There are many security and operational risks associated with having so many different hands on your hardware and software systems.

Technical failure risks

The hardware and software can and will fail from technical reasons and these are everyday problems. When it comes to GPU computing, you will want to have a large cluster of GPUs (as many as you can get). But the more servers you are running, the higher the odds of a hardware failure randomly occurring.

For example, a network card or power supply could burn out. Or a memory leak could crash a server.

A more subtle but also common problem is electromagnetic interference from the cosmic background radiation that can cause memory corruption.

Capacity risk

As your computational requirements fluctuate, you might have a sudden need to increase the scale up by adding capacity to your GPU cluster (i.e. ingadd more GPUs). There is a risk that your infrastructure will not be able to scale up with your application's requirements.

To help understand the situation, let me give you three examples of different GPU cluster setups with different cost/benefit profiles:

Tier 4 private data center

You build an underground bunker in a friendly sovereign country with strong privacy laws, in region of the planet that is safe from common natural disasters

Bio-metric security access systems in every server rack in the data center ensure that only authorized personnel can ever physically access particular hardware

You have two redundant mains power circuits coming from power generation plants on the grid to the data center

Each power mains circuit is backed by a UPS backup power supply and a backup generator (you have contracts with fuel supply companies to guarantee a supply of fuel to run the generators over an extended period of time)

Each device (server, switch, etc) in your data center has dual redundant power supplies, with each power supply connected to a separate mains circuit

The data center is connected to the outside world through multiple, redundant fiber optic links to backbone telecom carriers

The local network in your data center connecting the servers is fully redundant, with multiple, bound network interface cards in each server, connected to multiple, redundant switches, firewalls, etc.

The local network in your data center connecting the servers is fully redundant, with multiple, bound network interface cards in each server, connected to multiple, redundant switches, firewalls, etc.

Virtualization technology is used to run Operating Systems in Virtual Machines interconnected through Virtual LANs, to protect against physical hardware failures. The physical servers are configured as a Virtual Machine cluster, so that virtual machines can be seamlessly moved from one physical machine to another.

You only use server hardware with Error Correcting Code (ECC) memory, to protect against electro-magnetic interference from cosmic background radiation. For GPUs, this means that you must use NVIDIA's expensive Tesla series of cards which have ECC memory. The less expensive gaming cards do not have ECC memory so they are prone to data corruption.

You have a 24/7 Security Operations Center, doing things like monitoring network traffic and server activity to detect and respond to intrusion threats

You have a 24/7 Network Operations Center, monitoring and maintaining the status and health of systems, etc

You have redundant cooling systems for the data center

Air filtering and air quality monitoring systems are in place to avoid dust from entering the hardware

Fire detection, fire alarm and fire suppression systems with dry suppression agents are in place

Water pipes and other water systems are avoided - no water near the servers

You keep spare hardware components on-hand in case they need to be replaced

There is plenty of spare rack space, power and cooling capacity to scale up your data center if required

All aspects of the data center are continuously audited by an internal audit team

For added security redundancy, you build one or more similar secondary data center underground bunkers in a different countries and split your cluster across multiple regions. You use Global Server Load Balancing and Failover technology to seamless run your application across multiple data centers.

Instead of building your own data center, you could co-locate your data in a third-party data center. Co-location in a Tier 4 data center facility would be very expensive (especially if you are co-locating in multiple redundant, data centers), but not as expensive as building and owning your own facility. This is less secure that owning your own private data center, but also less costly.

You still own your own and run your own hardware that is co-located in the racks. You can secure them yourself and you can upgrade the hardware whenever you need to. There are many Tesla GPU server options available on the market from the different hardware vendors, like IBM, Dell, HP, etc.

Cloud based solution (e.g. Amazon EC2)

You do not have any physical control over the hardware and server virtualization systems. All of your servers are virtual machines running in Amazon's data center. Your application's security is at the mercy of Amazon and its employees.

You can setup a Virtual Private Network within Amazon for additional security. You can also run your own physical connections to Amazon's EC2 data centers.

Amazon offers three types of GPU instances: Spot instances, On-Demand instances and Reserved Instances. Spot instances are the cheapest, but you are not guaranteed capacity - you might bid on 10 GPUs in the spot market and only get 2, and you might lose your GPUs if the spot price exceeds your maximum bid. On-Demand instances are more expensive but cost a fixed rate and you can run them as long as you need to, but capacity is not guaranteed. Reserved instances are the most expensive and require paying an annual reservation fee for each server that you want to reserve, but this guarantees that you will always have access to the number of servers that you require.

The Amazon EC2 cloud is distributed across multiple geographic locations. This allows you to have multiple, redundant data centers

You don't have the buy and maintain your own hardware, and you pay for metered usage by the hour which can be very cost efficient if your have large fluctuations in your computational demands. If you need 100 GPUs for a few days out of the year, then wit would be much more cost efficient to use the cloud than to purchase and run your own hardware all year round.

Amazon only offers the NVIDIA Tesla C2050 Fermi GPUs. There are more GPU hardware options available if you own your own hardware, including the new Kepler GPUs from NVIDIA that are not yet available on Amazon

Renting Amazon GPUs is very cheap if you go with spot instances (these typically cost 60 cents per hour, with no upfront fee). But if you need reserved instances then the cost rise dramatically to around $5.00 per hour on average.

Working with a cloud like Amazon is more complicated than any other solution. It requires you to spend a lot of time learning on tangential issues about how the Amazon cloud works instead of spending time figuring out how to use GPUs to solve your actual application.

Let's say that most of the risks I've talked about are not not worth mitigating for you. You are willing to put a GPU cluster in your basement to save on the data center real estate costs, and you are willing to accept risks, such as the of a pipe bursting and flooding your basement "data center"

Your application is not mission-critical, it does not require 24/7 guaranteed uptime. If a natural disaster befalls your house then you will probably be worried about bigger things than the security of your GPUs.

You don't even care about electro-magnetic interference from cosmic background radiation. If your memory gets corrupted and a GPU hangs then you can just reboot the server (and you have a cluster of GPUs anyway, so losing one or two is not a big deal). You can use the latest NVIDIA GeForce Kepler based gaming cards that are cheaper and faster in single precision that the Tesla cards offered in Amazon. You are such a risk taker that you are willing to do all your computing in single precision instead of double precision, to take advantage of the cheaper gaming cards. So you can buy a high end gaming GPU from an OEM for $400 per card, instead of buying a Tesla GPU from NVIDIA for $2,500 per card and your $400 GPUs will be significantly faster than the more expensive Tesla GPUs, meaning you can buy more GPUs with your budget.

You can take a page out of the Google playbook ... instead of buying fancy servers from companies like IBM or Dell, you can build your own bare-bones, ultra low-cost GPU servers. This means you can buy even more GPUs with your budget. For example, you can skip all the bells and whistles like a case for the server, dual power supplies, hard drives, etc. Since your servers won't have any cases they will be easier to cool. You can also build your server with an ultra low-cost, low-power CPU (instead of spending money on big, expensive multi-core Intel CPUs that you won't use anyway since you are using GPUs), and instead spend money on more GPUs instead of useless CPUs. You can run everything on Linux to completely avoid software licensing fees, leaving you with more money to spend on GPU hardware.

This is not a very secure solution at all. However, this will get you hands-down the fast GPU supercomputer per dollar that you spend. This is your biggest bang for the buck if you're not worried about running a data center in your basement. And it is the easiest environment to maintain and develop on, there is no complicated cloud technology or security hindrances.

CONCLUSION:
For a password cracking machine, I would recommend going with Amazon EC2 if you plan to use large numbers of GPUs for small amounts of time. If you expect that you will only need to crack passwords once in a while, but when you do need to crack a password then you immediately want as many GPUs as you can get, then I would recommend that you go with Amazon EC2 spot or on-demand instances. However, if you plan to be cracking passwords 24/7, i.e. maximizing your utilization of your infrastructure, then I would enthusiastically recommend the DIY route of building a GPU cluster in your basement (just be sure that you have enough power capacity or you will trip a circuit). I should also say that the DIY route is much more fun.

As a consulting firm, you might end up with a client that wants to see results by tomorrow. The only way to scale that large is to use Amazon. Building your software so you can scale out to a thousand nodes is the only way you can do that (without owning all that hardware).

Using 1000 nodes for 1 day is the same cost as 100 nodes for 10 days, but you'll have the results a lot faster.

Brute force password cracking depends on the algorithm in use. For example are you trying to break a SHA1, SHA2 or BCRYPT password? Or is the underlying algorithm SCrypt?

It is not possible (as far as I know) to GPU accelerate a SCrypt based password hash, and only CPU optimizations are available due to the additional demands on RAM with the cryptography.

Next, are there libraries that support what you need to do? OpenCL is popular, but performs differently on NVidia GPUs vs AMD GPUs. In addition, you may get better performance with vendor-specific APIs.

Finally, know that each video card performs differently and a $700 NVIDIA card is feature rich but not too efficient in SHA2 hashing. A similar AMD video card will outperform an NVIDA card for SHA2 hashing by about 40% or more depending on the card.

If you choose to use the Amazon GPU cluster, know that the only OS is CentOS and you will have to have python skills (or similar) to get your code up and running. It may not be worth it in the long run since the GPU farm offered by Amazon includes old NVidia cards that are good for general math offloading, but not efficient for password hashing.