Currently busying myself with the Bitcoin "mining" algorithm, I am wondering if the process really cannot be simplified.

For reference, the algorithm is basically SHA-256d:

$success := SHA256( SHA256( IV || nonce ) ) < target$

The resulting hash of 256 bits is then interpreted as a single number of 256 digits (base 2). If this number is lower than the given $target$ number a valid nonce is found. (The $IV$ can be considered to be constant.)

At the moment, brute-forcing through all possible $nonce$s is deemed to be the only viable way of finding values that satisfy the above equation, and, considering the double-application of the hash function, this seems to be a reasonable assumption.

However, one may simplify the problem by (first) only looking at the second application of the hash function:

$success := SHA256( X ) < target$

I am not aware of the correct term (if there is one), so I would say this is a "partial preimage" problem. The constraint on the output is not that it equals a given value, but that it is lower than a given value. In other words, and again simplified: The constraint is that the first n digits (bits) of the output are 0, where n may be comparably low, 32 for example, as opposed to the full preimage attack where n = 256.

"SAT solving - An alternative to brute force bitcoin mining" seems to describe a logical approach to the problem: Transform the hash function to a boolean function and then, given the (relatively weak) constraint on the output of the hash function and (a part of) the known input $X$, perform backtracking to deduce a constraint on the input from the constraint on the output.

In papers about higher-order differential cryptanalysis of a reduced version of SHA-256 it is demonstrated that one may be able to predict a part of the internal state of the function which is reached after round $r$ from the internal state after round $r-d$ (for $d \in {[2, r[}$) with certain probabilities $p <= 1$ without actually computing rounds $r-d+1$ through $r$.

Now, considering the above constraint about the leading 0s of the output I have been trying to find out during the past days if there may be a probabilistic approach which could yield an estimate of the probability of a certain partial state after round $r$ from the outcome of round $r-d$.

Of course, for this Bitcoin case the hash function does not need to be considered a black box or an oracle, because the complete input is known and so is the complete internal state at any point of the calculation.

If such an estimate would exist for $r=64$ and $d \gg 1$ and this could be computed more efficiently than $d$ rounds of the hash function it would possibly allow a shortcut through the second application of the hash function.

Note that the estimation does not have to be very accurate, nor does the probability have to be very high to be of use. It is also not required that the estimate is safe in either direction; it is well acceptable to not find all solutions (nonces) this way.

Is anyone aware of such a probabilistic approach for forward-estimation for some of the rounds of SHA256?

Can there actually be one of significance? (An estimate yielding, for example, $p=0.501$ for one bit of output is probably not significant in the sense that it may reduce computational complexity.)

Edit:

To maybe illustrate the rationale behind my thougts, lets look at the (differentials of) basic operations in a statistical way:

a) XOR

For the XOR operation on single bits A and B

result := A xor B

the case is simple: Without actually doing the calculation we can tell beforehand that the result will be the inverse of B iff A == 1 and B iff A == 0 with probability 1. This equally holds for arbitraty length binary numbers A and B.

b) Addition

This one is more interesting because we have probabilites of 1 or less.

Again, for the single-bit operation A + B the result is the same as in the XOR case, with equal probability of 1.

However, when A and B are binary numbers of length n > 1 the result is different. Basically it turns out that the probability that one/all bit(s) in the result is inverted from B is higher for a bigger A;

extrema: A == 0 => p = 0, A == 1111...111 => p = 1

Plus: 0 < A < 1111...111 => 0 < p < 1

XOR and addition are commutative so that A and B may be exchanged freely, maybe even in parts.

I was thinking that obervations like these could be leveraged to make statements like:

"If at some point t in the calculation we obtain a value for some state variable A that is greater than some threshold X and [some statement about B] the probability p of a positive outcome of the complete hashing function is greater/lower than P and we continue/abort abort the current calculation."

Yes, I have seen (and linked to in my question :)) that web site. And I found that the "pure" SAT approach does not gain much. I'm just still wondering if there are no short cuts to be found if simplifying (for instance the SAT approach) by probabilistic/heuristic approaches. -- I think I need to look up those papers about differential c.-analysis I referred to; they probably would show more clearly how I believe SHA's complexity could be reduced by probabilistic/heuristic means.
–
Hanno BinderMar 22 '14 at 17:32

Alledgedly, it does when reduced to some 30 rounds. Which could be a reasonable approach. If it fails for, say, 15 rounds with significant probabilities, that may lead to a possible shortcut in the way I'm thinking of, reducing the computation to, e.g., 49 rounds + one probability estimation.
–
Hanno BinderMar 23 '14 at 10:39

1 Answer
1

The Bitcoin mining algorithm can not be simplified by exploiting any weakness in the SHA-2 hashing algorithm with the current state of the art.

The problem is manyfold. From the SHA-256 point of view, there is no (partial) preimage search algorithm that applies to the full hash function. Even worse, the attacks that penetrate a fewer number of rounds have huge complexity. In fact, no practical (partial) preimage attacks on any version of SHA-256 with more than 20 (out of 64) rounds have been ever constructed.

From the Bitcoin point of view, the situation is even worse. Recall that SHA-256 splits its input to 512-bit blocks and feeds them iteratively to the internal compression function, whose inverse is the core problem of preimage attacks. What is hashed in Bitcoin is the block header of 640 bits long, which requires two calls of the compression function (it consumes 512 bits per call), plus extra one in the outer call of SHA-256. The second call of the compression function gives little freedom to the attacker, who controls less than 128 bits of the 512-bit message block. As a result, even if SHA-256 was reduced threefold, it would likely to withstand attacks in the Bitcoin setting, not speaking of the full version.

Bitcoin miners still do some optimization by precomputing the 512 bits of the block header and hence the first call of the compression function, but the next steps can not be optimized much.

Thank you for answering. However, I am not convinced that the process cannot be simplified. Especially I was interested in a probabilistic approach in forward direction of the algorithm which is not necessarily the same as breaking the compression function 'backwards'. Unfortunately, my researches about that seem to be no longer of any use given that the evolution of hardware (ASIC) has caused far more increases in speed than could ever be achieved by 'clever' exploitation of the algorithm; even if one could reduce the effort by 50% the ASICs will easily outperform that in a short time.
–
Hanno BinderOct 30 '13 at 16:35

@Hanno if it were that easy someone would have done it. SAT solving has been used successfully in cryptanalysis, but usually it's to assist with an attack found by manual analysis. With any modern cipher, you can't just plug the whole thing into a solver and hope to get anywhere.
–
AntimonyDec 21 '13 at 8:35

3

@HannoBinder "a probabilistic approach in forward direction of the algorithm which is not necessarily the same as breaking the compression function 'backwards'" - I believe these two things are in fact equivalent. If you know that a certain set of messages have a higher than normal probability of producing a digest which is in a specified range, and you have a digest within that range, then you can achieve better than brute force for finding the message which hashes to that digest. Which means you have "broken the compression function backwards" and managed a successful preimage attack.
–
JBentleyFeb 17 '14 at 15:42

1

@JBentley Seen this way, your argument sounds valid. Yet, I'd say it depends on the definition of "brute force". If you can probabilistically "skip" some d rounds inside the hash function this does not mean you can preclude a set of input messages without acutally doing r-d rounds for each input message. So, "better than brute force": Yes, possibly in terms of the number of computation steps required; no, in terms of the number of applications of the (partial) hash function needed.
–
Hanno BinderFeb 19 '14 at 13:55