I want to show that finding cycles in a cryptographic hash function is hard.

My thought: assume there a black box, that given a cryptographic hash function $h$, finds some $x$ and $r$ in polynomial time, such that applying $h()$, $r$ times gives $x$:
$$ h(h(h\ldots(h(x))) = x $$

Can this be used to prove the function is not collision resistant or pre-image resistant?

$\begingroup$@fgrieu, Do you mean that I know the pre-image of $x$, which is $h^{r-1}(x)$? What if I don't know $h^{r-1}$, because $r$ is very big.$\endgroup$
– amanuskMar 5 '19 at 18:00

$\begingroup$@kelalaka, thanks. Is there a better approach to this?$\endgroup$
– amanuskMar 5 '19 at 19:30

2

$\begingroup$The only approach I can think of to address the overall problem would be to show that finding cycles in random oracles is hard; that is, given $n$ queries to the random oracle, the probability of finding a cycle within those queries is $\le f(n)$ for some explicit $f(n)$, and show that, for any reasonable $n$, that is tiny...$\endgroup$
– ponchoMar 5 '19 at 21:16

$\begingroup$I find this question a bit problematic/deeper than it seems. fgrieu in this answer showed that when an iterated random function has 7-bit output the cycles occur. We expect them, the question is how they are distributed and the relation with the resistance(s).$\endgroup$
– kelalakaMar 6 '19 at 11:05

Standard collision search algorithms work by following such a chain with a tortoise–hare algorithm to find the point where a non-cyclic initial chain coincides with a cycle. But your black box somehow finds just the cycle without any initial chain. Although the distribution of cycle sizes is weighted more toward smaller than larger cycles with expectation $\frac 1 4 \sqrt{2\pi n}$ (Harris, Eq. 3.11), the number $q$ of elements that are themselves on cycles has the same distribution as $s$ (Harris, Eq. 3.12).

For, e.g., a 256-bit hash function as you need for the standard 128-bit collision resistance, the expected probability of stumbling upon an element on a cycle by chance is about $2^{-128}$. Even for a 128-bit hash function like MD5, although the expected probability of stumbling upon an element on a cycle by chance is about $2^{-64}$, the expected cost of confirming that it is on a cycle is $2^{64}$ steps, for a generic algorithm, although maybe there's a clever way to speed this up in batch or in parallel like the van Oorschot–Wiener collision search machine.

If, for some particular$b$-bit hash function aimed at random oracle applications like SHA3-256, you found a way to find an element on a cycle with substantially higher probability than $2^{-b/2}$, or a way to confirm whether an element is on a cycle or not with substantially lower cost than $2^{b/2}$, it would likely be worthy of publication and cast serious doubt on the security of the hash function.

My thought: assume there a black box, that given a cryptographic hash function $h$, finds some $x$ and $r$ in polynomial time, such that applying $h()$, $r$ times gives $x$:
$$ h(h(h\ldots(h(x))) = x $$
Can this be used to prove the function is not collision resistant or pre-image resistant?

It is not a priori clear that this black box is helpful for finding collisions or preimages: clearly it does not itself find a collision, and it's not parametrized by an image for which it could find a preimage.

Of course, the field of generic collision search is a black art underlying a surprising array of cryptanalytic attacks including the usual hash function collision search, Pollard's $\rho$ for factorization, Pollard's $\rho$ for discrete logarithms, hash function preimage search including AES key recovery, and so on.

So it would not surprise me if a black box with this capability could be fruitfully used to accelerate a generic collision search, or if it could be proved not helpful, but it would be an interesting theorem either way!

*Harris 1960 is an excellent compendium of probability distributions for various quantities in various important types of random mappings. Read it!

My thought: assume there a black box, that given a cryptographic hash function $h$, finds some $x$ and $r$ in polynomial time, such
that applying $h()$, $r$ times gives $x$: $$ h(h(h\ldots(h(x))) = x $$

I thought about this, and I found that the answer is yes. The oracle can be used to find preimages and collisions in an arbitrary function $$h : \{0,1\}^n \longrightarrow \{0,1\}^k.$$

Querying the oracle with $h'$, it will return a cycle of length $1$ if $h$ has a collision and that it has a fixed point. If $h$ does not have a fixed point, then the oracle will return a cycle with all elements of length $2^k$. If so, the cycle structure can be re-randomized via a (efficiently computable) random permutation $\pi$ as $\pi \circ h$. Eventually, it will have a fixed point.

The oracle will (eventually) return a $1$-cycle in $h'$. From construction, we know that $h(y) \neq y$ (if so, it would not be a fixed point in $h'$). But we also know that $h(h(y)) = h(y)$. This gives collision $(h(y), y)$.

Assume that we want to find a cycle of length 1. We call $\textsf{Oracle}(u_{A,B}, 1)$, with $A = 0, B = 2^k-1$. If it returns no, then there are no such cycles. Otherwise, we can conduct binary search on $A$ and $B$ to find the fixed point.

To conclude, this oracle is very strong and capable of solving any problem in NP. For instance, any 3SAT can be encoded as $h$ which returns a $1$-cycle whenever the formula is satisfied. For instance, any 3SAT can be encoded as function $h$ which returns a $1$-cycle whenever a formula $\varphi$ (in $k$ variables) is satisfied:

$$v_{\varphi}(x) =
\begin{cases}
x & \quad \text{if } \varphi(x_0,x_1,...,x_k) = 1\\
x+1 \bmod 2^k& \quad \text{otherwise}.
\end{cases}
$$
If a $1$-cycle is returned, then $x$ encodes this (possibly one out of many) solution, while if a $2^k$-cycle is returned then $\varphi$ has no satisfying solution. So, finding a cycle in an arbitrary hash function $h$ is at least as hard as 3SAT.

Edit: I have thought about the issue again and I now think the answer is much simpler. I'm leaving my previous answer below unchanged in case that parts of it are useful for someone as related information.

However, I think the answer to the original question is simply that it is not necessarily hard to find cycles in a cryptographically strong hash function when such a function is understood to be an efficiently computable function resistant to preimage, second preimage and collision attack. As a counterexample, let $H$ be the function from finite bitstrings to $\{0,1\}^{256}$ that is equal to SHA256 except that a string consisting of 256 zero bits is mapped to itself. We do not know much more preimages for this function than for SHA256, since for SHA256 we can compute image-preimage pairs at will. Collision resistance and second preimage resistance are likewise equal to SHA256 unless a preimage of zero can be exhibited for SHA256. So $H$ should be a cryptographically strong hash function, but it is trivial to find a cycle in it.

I find it non-trivial to formalize the question you are asking. For instance, if the hash function in question has short cycles, then it is of course easy to imagine an adversary that just knows a short cycle and prints it. The same problems appear when one tries to rigorously define standard hash function security notions such as collision resistance; see e.g. Rogaways' paper on Collision-Resistant Hashing without the Keys [1] for further discussion.

On the other hand, if the adversary has to print a cycle, then that cycle will contain preimages for all the elements of the cycle. If the cycle has to contain one specific element not chosen by the adversary and if the cycle length is of the order of magnitude expected for a random function with fixed-size output, this will break pre-image security for elements of the functions' range that are in short cycles. Of course, for a random function, most elements are not expected to be part of a cycle.

If you are asking whether deciding if an element is part of a cycle is necessarily hard for a strong cryptographic hash function (i.e. an efficiently computable, collision-resistant, preimage-resistant, second preimage resistant function), then I think it will at least be difficult to prove that. For instance, suppose $H_1$ is a strong cryptographic hash function with output size $n$ and that $H_2$ is a one-way permutation on $n$-bit inputs. If we set $F(x) := H_1(x)$ for x of bit-size $\neq n$ and $F(x) = H_2(x)$ otherwise, then exactly all $n$ bit inputs are part of some cycle. $H_2$ should not introduce problems with collision resistance, since it is itself collision free and collisions with $H_1$ should be hard to compute for random $H_1$ and $H_2$. To prove the conjecture that cycle membership is hard to decide for cryptographic hash functions, one would then have to show at least that efficiently computable permutations cannot be preimage secure in the way preimage security is commonly defined for hash functions. I do not know if this is true (I would weakly conjecture it is not, although all simple examples that come to mind fail), but if there were a simple proof either way, I feel it would be widely known.