Confused by crypto? Here's what that password hashing stuff means in English

Encryption, certificates, public and private keys – it's all here

Cryptography is dead hard. But being conversant in the key aspects of cryptography – to the extent that you could even explain some of it to colleagues and management – puts you one step ahead of most. Here are five things that'll make you sound like you know what you're talking about.

1. Digital certificates

The most common place to come across a digital certificate is on a Web site that's secured with HTTPS (the secured version of the HyperText Transfer Protocol). In fact, the security of a Web site has two facets:

Authentication: a mechanism that lets you be sure that the owner of the Web site is who it says it is.

Encryption: encoding the communications between your browser and the Web site so that nobody can sit between you and it and intercept (and decode) what you're sending and receiving. We'll come to encryption later.

Authentication is achieved via a digital certificate. This is an electronic document that's issued to the owner of a Web site by a Certification Authority (“CA”). To obtain a certificate from a CA you have to convince them of your credentials (so if you're a company you'll have to provide evidence of the company's registered office and such like).

Digital certificates are all about trust: basically the CA is vouching for the identity of the Web site. So you trust that www.facebook.com is genuine because you trust DigiCert (its CA) to vouch for it (note that your browser will check with the issuing CA to ensure the certificate's still valid and hasn't been revoked for any reason).

If you clicked on the padlock in the browser's address bar and it said the certificate was issued by Bob's Dodgy Certificate Services of Nowhere, Alabama you'd probably be less trusting. Oh, and how does your computer know whom to trust? Easy: the vendors of your computer's operating system and Web browser equip them with lists of trusted CAs, which are updated regularly.

2. Certificate pinning

Some vendors take it a step further and demand that only a particular CA is acceptable for their certificates. Google's the classic example: their Chrome browser will expect the CA for the certificate on google.com, gmail.com and its various other sites to be Google Internet Authority G2; if it were to see a cert from a different supplier it would smell a rat. Specifying the CA for your certificate is referred to as “pinning”.

3. What's hashing?

A hash function, in a cryptographic sense, takes a chunk of data and makes it into another anonymous-looking chunk of data that is, to all intents and purposes, impossible to revert into the original form. Why do you need it? Simple: would you rather store a plain-text password or an unreadable version of it in your server's password database?

The concept's straightforward enough. Your computer's operating system stores its passwords in hashed form. When a user attempts to log in, instead of decrypting the stored version and comparing the plain-text version, it takes the password they typed, hashes it, and compares it with the stored version. Note that regardless of the size of the original, the hash is the same size, so SHA2-512, for example, produces a hash of 512 bits, or 64 bytes.

Hashes are used throughout the secure communications world – we'll mention them again later.

If you're setting up security on a system and it asks you what hash function you want, don't choose Message Digest 5 (“MD5”). Why? Because MD5 has been proven to be susceptible to “collisions”, where more than one original chunk of data segment hashes to the same result. Attacks on MD5 ciphers are ten-a-penny these days.

Always go for a flavour of the Secure Hash Algorithm (SHA), as that's thus far not been shown as susceptible to attacks. One of the SHA-2 flavours is fine, but since SHA-3 was officially released as a standard in 2015 (the underlying algorithm having been chosen in 2012) you should choose that if it's available in your implementation. Is that because it's known to be more secure than SHA-2? Nope: it's just that it's only been around for a short while and hence people have had far less opportunity to break it than the its 2001-launched predecessor.

4. Symmetric and asymmetric encryption

On to encryption. Most approaches to encryption that we come across in IT use the concept of a key: the sender takes the original data and encrypts it in some way based on the key, then the receiver uses a key to decode what he receives. In symmetric encryption you use the same key for encoding as for decoding. In asymmetric encryption you have two keys that are mathematically related, using one for encoding and the other for decoding.

Symmetric cryptography is very fast, but you have the problem of sharing the key in the first place. The asymmetric approach gets rid of the need to share the key but it's hideously slow (maybe 10,000 times slower than the symmetric alternative). The answer: use asymmetric to share the key at the start of the transfer, then use the shared key for symmetric cryptography on the actual data transfer.

And this is precisely what happens: to quote my copy of Chrome when I check out the encryption information on the address bar padlock: “The connection is encrypted and authenticated using AES_128_GCM and uses ECDHE_RSA as the key exchange mechanism” (ECDHE is an elliptic curve algorithm).

5. Public and private keys

Right, are you listening carefully? I'm going to try to do public key cryptography in 250 words or less.

Public key cryptography is an asymmetric approach: stuff is encoded using one key and decoded with another. Each party generates two keys (a public one and a private one), keeps the private one private and publishes the public one. As we mentioned earlier, each party's two keys are mathematically related to each other.

If A wants to send a message to B, A gets B's public key and uses it to encrypt the message. When B receives it, B uses his own private key to decode the message. Job done.

Except it isn't – because who's to say the message hasn't been sent to B by someone pretending to be A? There's an extra step, then: A needs to “sign” the message, and B needs to check the “signature”.

To do this, A hashes the message (told you we'd come back to hashing) into a nice compact result. A then uses its own private key to encrypt the hash and sends it to B. B uses A's public key to decrypt the hash.

All B now has to do is hash the original message that A sent and compare with the hash it just received and decrypted; if they're the same, then B can be confident that the message really was from A, because only A could have encrypted it with A's private key.

Which is why if your private key gets disclosed you must immediately revoke it and generate a new pair of keys, otherwise someone else could masquerade as you.

1 (continued). Self-signed digital certificates

I said there would be five things to remember, so I'm going to cheat and revisit item 1. You've probably heard of a self-signed certificate: this is one where you've not applied to a CA but have generated the certificate yourself. But what's the point of that, given that the whole point of a cert is to get a trusted party to vouch for your identity, which it clearly won't if you've self-signed it?

Easy: the cert does more than identify you – it's also, among other things, the means by which you publish your public key. So I just checked out the certificate on GlobalSign's site, for instance, and it told me that it has a 256-byte (2,048 bit) public key and then listed the content of it.

Wrapping up

Cryptography, then, is full of weird maths. It often feels like witchcraft (not least when you wonder how on earth the mathematicians concocted a means of encrypting a message with one key then decrypting it with another). But to be conversant you don't have to understand it in any great detail, just to know the concepts and roughly how they work.