If it helps anyone, here's a modified sha256.c file that takes 256 bytesoff the stack and shrinks the main code loop by a factor of 8.

The speed penalty is pretty minor, and might actually be negative due tobetter icache usage.

Includes x86 stubs for standalone self-test using the NIST test vectors.Enable the #if 0 parts and remove the commented-out debug linesfor production use.

A few other additions:- Uses optimized unaligned-fetch and byte-swap routines.- Reordered the the Ch() and Maj() functions slightly to work better on 2-address instruction sets.- Got rid of two temporaries in the round function- Removed the need to memset() 256 bytes on the stack every call to sha256_transform- Uses memset() rather than a padding array.

A few questions:- Should we just use a u64 to count the bytes rather than doing carries by hand?- Do we actually need support for hashing more than 2^32 bytes in the first place?- This does the aligned/unaligned check at the outermost feasible loop position in the code, leading to quite a bit of duplication. If the processor supports unaligned loads natively, it isn't even necessary at all. An better ideas? I can understand: - Moving the test to once per call to sha256_transform, on the grounds that the latter outweights it by a large amount. - Always using the unaligned load, likewise. - Rolling my own unaligned block move-and-byte-swap (using aligned loads, shifts, and algned stores), since the existing unaligned code isn't quite what we want. - Just using memcpy and byte-swap in place, and letting the L1 cache take care of it.- The e() functions as written take a an temp register in addition to the input and output. Would it be better to rewrite them as e.g. better to rewrite them as e.g. static inline u32 e0(u32 x) { u32 y = x; y = RORu32(y, 22-13) ^ x; y = RORu32(y, 13-2) ^ x; return RORu32(y, 2); } to get rid of the need? (And can someone figure out a similar trick for the s() functions? I don't think it's as important because there's less register pressure when they're used.)(P.S. These modifications are in the public domain; copyright abandoned.)