Blog |
Reference |
About |
ContactUnderstanding the Secure Hash Algorithm 2: Part 1A handy reference for understanding one of the most used encryption algorithms: SHA-2.12 May 2018

This is the first in a three part series of articles that break down the SHA-2 algorithm. In part one, we'll go over the initial definitions that are integral to understanding the algorithm and write sample code in C. In part two, we'll compute the hash given the initial values we generated in part one. And then in part three, we'll take another look at the program, comparing it against the defacto implementations and see how we can optimise it to run faster.

SHA-2

SHA-2 (Secure Hash Algorithm 2) is used trillions of times a day to compute cryptographic hashes and is essential for the stability of everything on the web. That being said I'm not exactly sure how it works and thought it would be a good exercise to sit down and break it into digestible chunks.

My reference is the PDF published on the National Institute of Standards and Technology site here.

Here are a few concepts that need to be internalised before moving forward.

Definitions

One way function

A function that is easy to compute but hard to reverse.

Ability to process arbitrary length inputs

A hash function can split up messages into smaller chunks and then operate on them sequentially.

Merkle-Damgard construction

A way to build a cryptographic hashing function that retains the collision resistant properties of the hashing function.

Time to brute-force a hash function

For a hash function where L is the bits per digest, then finding a matching message will take 2 raised to the L evaluations. Known as a pre-image attack.

Initialisation

First, we begin by calculating all the initial values. There are two sets: (1) the fractional portions of the square roots of the first 8 primes; and (2) the fractional portions of the cube roots of the first 64 primes. I have generated them here.

Preparing the message

Next, we prepare the message for processing. For this tutorial, I'll be using "SNSD" as the initial message. This message has a length of 20 bits. We append a single '1' bit to the end (step 1) -- this specifies the end of the message in padded message. Then we add (step 2) k zero bits where k is the smallest number that makes the following equation true: l + 1 + k = 448 mod 512. Solving for k in this message gives us k = 448 - (20 + 1) = 429 zero bits. After that, we add the message length in the last 64 bit block (step 3).

S

N

S

D

1 bit

429 zero bits

Length bits

0x53

0x4E

0x53

0x44

0x1

429 * 0x0

0x00000000000014

Reading in the message

Next, we split the message into blocks so we can process it. This step is simple for this short example. The code at the end of the post handles more difficult lengths of messages. We'll reference each N block of 512 as MN and each 32-bit word (W) within the block as MWN. Therefore, for this message we have:

Block M11 = 0x534E5344

Block M12 = 0x80000000

Block M13 = 0x00000000

[....blocks of zeroes.....]

Block M116 = 0x00000020

Putting it all together

The following (hastily put together, if I might add) C program will calculate the initial values that I showed earlier and the padded message. Look for part 2 where I will cover the hashing function.