A very sensitive application has to protect several different forms of data, such as passwords, credit cards, and secret documents - and encryption keys, of course.
As an alternative to developing a custom solution around (standard) encryption and key management processes, the purchase of an HSM (Hardware security module) is under consideration.

Obviously this would depend (at least in part) on the specific application, company, data types, technologies, and budget - but I would like to keep this generic, so I am leaving it at a high-level view, and ignoring a specific workflow.

Let's just assume there is secret data that needs to be encrypted, and we are looking to hardware-based solutions to manage the complexity and probable insecurity of bespoke key management, and mitigate the obvious threats against software-based encryption and keys.

What are the factors and criteria that should be considered, when comparing and selecting an HSM? And what are the considerations for each?

For example:

Obviously cost is a factor, but are there different pricing models? What should be taken into account?

Are some products more suited to different forms of encryption (e.g. symmetric vs asymmetric, block vs stream, etc)?

Same for different workflows and lifecycles

What level of assurance, e.g. what level of FIPS 140-2, is required when?

Network-attached vs server-attached

etc

TO BE CLEAR: I am NOT looking for product recommendations here, rather just for how to evaluate any specific product. Feel free to name products in comments, or better yet in the chatroom...

Is there a limit to the number of keys it supports, and could that limit be a problem?

How easy is it to add another HSM when your application becomes more demanding (size, speed, geographic distribution...)

Redundancy - when one HSM breaks, how much of an impact is it on your operations, how easy is it to replace without loss of service, etc

Backups - how easy is it to automate and restore? Do you need to independently protect the backup's confidentiality and/or integrity or does the product ensure that? How likely are you to end up in a position where you've irrecoverably lost your data (how many factors need to be lost / forgotten, HSMs died, etc).

vendor proprietary (probably the most flexible/powerful/secure-if-you-know-what-you're-doing but increases cost to move to another vendor), and whilst C is probably a given, does it have bindings for your preferred language?

a related note: is their guidance on integration with your application (e.g. DBMS, OS services)?

OS / hardware support

Management options - what GUI / command line tools are there for doing management tasks - i.e. anything that you do infrequently enough to not want to automate (key generation?; authentication factor management?). Do your admins need to be physically present to commission the device or perform additional tasks after commissioning?

Programmability - most of your development will likely be on the other end of one of the APIs, but sometimes it is useful to be able to write applications that run on the device for greater flexibility or speed (see Thomas' answer)

Physical security - how resistant to direct physical attack does your solution need to be (bearing in mind not just the HSM but the whole solution)? If for whatever reason you decide it is particularly important (your HSM is exposed but your clients aren't, or disclosure of the keys is far worse than merely being able to use the keys for nefarious purposes - ref DigiNotar?) then you might want to look for active tamper detection and response, not just passive tamper resistance and evidence.

Algorithms - does the HSM support the crypto you want to use (primitives, modes of operation and parameters e.g. curves, key sizes)?

Authentication options - passwords; quorums; n-factors; smartcards; OTP; ... You should probably at least be looking for something that can require a configurable quorum size of token+password authenticated users before allowing operations using a key.

Policy options - you might want to be able to define policies such as controlling whether: keys can be exported from the HSM (wrapped or unencrypted); a key can only be used for signing/encryption/decryption/...; authentication is required for signing but not verifying; etc.

Audit capability - including both HSM-like operations (generated key, signed something with key Y) and handling crashes (ref g3k's comment). How easy is it going to be to integrate the logs into something like Splunk (sane log format, syslog/snmp/other network accessible - or at least non-proprietary - output)?

Form factor:

networkattached (for larger scale deployments, particularly where multiple applications/servers/clients need to make use of the keys);

desktop (for individual use; performance, availability and scalability not a big concern but cost is, especially good if your solution requires lots of people needing direct access to an HSM);

PCI (-express) (cheaper than network attached; more effort involved in making available to multiple applications);

Certifications - do you need any / do you want any because they give you confidence in the product's security? Ignoring what you need for regulatory reasons:

FIPS 140-2 provides useful confirmation that the NIST-approved algorithms work and have run-time known answer tests (check the Security Policy to see what algs they've got approved), but don't put much stock in it otherwise showing the product is secure; my rule of thumb for Level 3 hardware security means people with only a couple of minutes access to the device will be hard pressed to compromise it. FIPS 140-2 Level 3 is the defacto baseline certification for HSMs - be wary if it doesn't have one (though that's not to say you need to use it in a FIPS compliant way).

Common Criteria evaluations are flexible in the assurance they provide: read the Security Target! There are no decent HSM Protection Profiles yet, so at the least you're going to have to read the Security Problem Definition (threats and assumptions) before you have an idea what the evaluation is providing.

PCI-HSM will be useful if you're in the relevant industry

Aside from certifications, how does the vendor look like at security? Having CC EAL4 certs is a good starting point, but remember Win2k has those too... Do they make convincing noises about supply chain integrity, Secure Software Development Lifecycle, ISO2700x, or something like The Open Group's Trusted Technology Provider Framework?

Do you like the vendor's policy on disclosure?

Support (options, reputation, available in your language)

Services - if you have a complex requirement, it might be advantageous to have the vendor involved in your configuration/programming.

Documentation:

High level documentation - HSMs are complex general purpose products that can require somewhat involved management; good documentation is important to allow you to develop a secure and workable process around them (see Thomas' answer for more discussion).

API documentation - good coverage, preferably including good examples of common (and complex) tasks

+1, fantastic! This is exactly what I was hoping for, thanks!
–
AviD♦Jun 2 '13 at 10:27

Can you elaborate on some of the non-trivial items? E.g. when should I prefer network-attached, as opposed to desktop? And etc, some criteria how to judge those elements and how to know what tradeoffs to make (just in high-level...). Also @Thomas mentioned backupability, I think thats the only one missing.
–
AviD♦Jun 2 '13 at 10:29

Michael this is wonderful, thanks! I would upvote you again if I could... instead, remind me I owe you a beer. :-)
–
AviD♦Jun 3 '13 at 10:19

1

"Logical security model - can malicious entities on the >network abuse your HSM? Malicious processes on the host PC? " To add to that, how your HSM handles these events. My org's HSM will turtle up completely when it detects an event and doesn't log a lot of things (nature of the beast, PCI, etc). We have redundant HSMs with failover. We had an issue where a single ARP was bringing them down, staggered throughout the day. We ran around for a week trying to figure it out.
–
g3kJun 3 '13 at 17:04

A HSM will not avoid complexity; rather, it will add quite a lot of complexity to the whole system.

What HSM do best is key storage: the key is in the HSM and does not get out of it, never. However, you still have to worry about the key life cycle. With a "software" key, stored in a file or in the entrails of the operating system, backups are a vulnerability (you don't want to have many copies of the key floating around). With the HSM, this vulnerability is avoided, but backups become a major headache: losing the key is also a major risk, especially for encryption (if you lose the encryption key, you lose the data). So that's a first item to look at for HSM: backup procedures. I have some experience with Thales (nCipher) HSM, which do it like this: the keys are actually stored as encrypted files (which can be saved just like any file), and the decryption key for that key can be rebuilt with a quorum of administrator smart cards (within a new HSM).

HSM rarely do bulk symmetric encryption. It does not make much sense, actually, to do symmetric encryption with a HSM: you use encryption because the data is confidential. Logically, if the need for secrecy is such that the symmetric key must not leave the HSM, then the data itself should not leave it either. Also, symmetric encryption means that both encryption and decryption use the same key: if that key is in the HSM, then encryption and decryption will both have to go through it.

HSM are better used with hybrid encryption: the HSM stores and uses the private key of an asymmetric encryption system; when data is to be encrypted, whoever has the data generates a random symmetric key K, encrypts the data with K, and encrypts K with the public key corresponding to the HSM-stored private key. In that sense, HSM operate as (oversized, overpriced) smart cards.

Of course, there is another extreme, in which you fit your entire application within the HSM. This requires a programmable HSM, and that's a completely different context. Thales HSM allow that as an option (it's called "CodeSafe" and "SEE"), which they don't give away for free... and don't expect running traditional code in that. HSM have crypto accelerators, but they are otherwise fairly limited embedded systems (think 60 MHz ARM CPU at best: HSM shielding is at odds with heat dissipation). You can fit relatively complex code in a HSM (which allows for it) but it is a specific programming effort. Also, some HSM don't allow it at all.

Though HSM are expensive, the biggest cost in a HSM is operations: they entail a lot of procedures for installing, configuring, operating, restoring and retiring. You will need people. My main criterion would then be: procedures. A good HSM will come with a detailed usage manual which describes how things should be done. It's not the hardware which matters, but how you use it.

Certifications, like EAL 4+ or FIPS 140-2 Level 3, may be required for regulatory purposes. You rarely choose whether you need it or not; that's a requirement from the intended usage context. Obtaining such a certification is a very long and expensive process, so you won't do that by yourself. On the other hand, you might want to broaden your shopping area: if HSM are mainly big smart cards, smart cards might be usable in lieu of the HSM. A 20 EUR smart card can be FIPS 140-2 Level 3; it will compute only one RSA-2048 decryption per second instead of 500, but that may be sufficient for you.

Thanks Thomas, there is some good information here. So to sum up the points relative to the question: 1. Backup functionality; 2. Support for symmetric/hybrid encryption; 3. Programmability; 4. Documentation (for operational procedures); 5. Certifications (if you need them). 6. Additional point is to consider a smart card, instead of a full-blown HSM.
–
AviD♦May 30 '13 at 13:20

Btw re procedures, these could just as well be external, if not orthogonal, to the actual product. Sure, it needs to be specific to the particular product, but I would be more interested in knowing up front what specific features or functionality should be required in the product, irrelevant of implementation of the procedure.
–
AviD♦May 30 '13 at 13:23