2 Answers
2

Introduction

AsicBoost speeds up Bitcoin mining in general (for ASICs and CPUs alike) by reducing the frequency of computing one part of the SHA-256 calculation.

A Bitcoin block header is 80 bytes long. It fits in 2 blocks for SHA-256 hashing. It gets hashed into a 32-byte value, then hashed again (1 block) to get the final value that is compared to the threshold.

Notice above that the inner loop has 2 calculations of block expansion and 2 calculations of block compression.

Now what AsicBoost proposes is that we somehow find a bunch of blockHeader values where sha256Block0 is different but sha256Block1 is the same. Because the Merkle root field straddles both hashing blocks, it means we need to group candidates by the last 4 bytes of the Merkle hash. Now the mining algorithm looks like this:

Now notice that the inner loop performs 1 calculation of block expansion per candidate group, and then 1 calculation of block expansion plus 2 calculations of block compression per candidate block header.

Thus the technique wins over conventional mining when most candidate groups have more than one candidate, and that the overhead of generating and sorting candidates exceeds the gains from saving at most one calculation of block expansion per candidate.

I'm reproducing figures 2 and 3 from the paper below to provide an alternative interpretation.

Fig 2:

Fig 3:

Historically, mining is composed of an inner loop (red) and an outer loop (green). Each run through the inner loop, the nonce is incremented. This affects only Chunk 2 and causes all of the blocks in red to be re-evaluated. You can only do this 4 billion times (2^32) before you start creating duplicated results. At that point, the outer loop adjusts the layout of the block data to obtain a new random merkle root field, and you repeat.

In this implementation, you have to do 4 large operations for each inner loop. When changing the merkle root as part of the outer loop, you have to do an additional 2 large operations (in green) on top of the normal 4. However, the author of AsicBoost noticed that when altering the merkle root value, there's a 1:2^32 chance that the contents of Chunk 2 don't change, in which case the output of Chunk 2's expander doesn't change either, so we can skip redoing that computation.

The key at this point is the Birthday Problem. Before we begin mining a block, we can compute several random merkle roots and find collisions (i.e. multiple merkle roots that have the same last 4 bytes, or "tail") without having to evaluate anywhere near 2^32 merkle roots. Additionally, precompute the value labeled "mid state" associated with each of these different merkle roots.

Finally, this allows us an alternative way to generate a new digest without incrementing the nonce: reuse the current Chunk 2 value, choose a new merkle root that collides with the tail currently in use and update the "mid state" with the value associated with that merkle root (which we computed previously). Then re-evaluate all the red blocks except for the Expander attached to Chunk 2, since we haven't altered Chunk 2's value - that's only three large operations instead of the usual four.

Example

Let's say we found 3 merkle roots that all have the same tail and precomputed their associated mid states (A, B, C). Our mining loop looks something like this:

Set nonce=0, midstate=A (4 operations).

Set midstate=B (3 operations).

Set midstate=C (3 operations).

Set nonce=1, midstate=A (4 operations).

Set midstate=B (3 operations).

... And so forth.

Each merkle root collision that you precompute saves you 2^32 Expand operations, since you can reuse it across all 2^32 nonces. Because of the birthday problem, the cost to precompute these collisions is less than 2^32 per collision, so it can be a net gain.

In practice, you'll end up with multiple disjoint sets of colliding merkle roots rather than just the 1 set shown here. So you might cycle through {A, B, C}, then {D, E, F}, and switching from C to D will cost the normal 4 operations because the tail has changed. Or, you might feed each disjoint set to a different core/chip of the miner.