Friday, June 26, 2015

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. In this week, we discuss the difference between a covert channel and a side-channel.

Covert channels and side-channels are two types of
information leakage channels.

A covert channel uses mechanisms that are not intended for
communications, e.g., writing and checking if a file is locked to convey a “1”
or “0”. In a covert channel an insider process leaks information to an outsider
process not normally allowed to access that information. The insider (sending)
process could be a Trojan horse program previously inserted stealthily into the
computer. An outsider (receiving) process need only be an unprivileged process [1].

In side-channel attacks, also known as passive non-invasive attacks, the cryptographic device is essentially attacked as it is, i.e. only directly accessible interfaces are exploited. The device is not permanently altered and therefore no evidence of an attack is left behind. The basic idea of side-channel attacks is to determine the secret key of a cryptographic device by measuring its execution time, its power consumption, or its electromagnetic field [2].

In a physical side-channel attack, unconventional techniques
are used to deduce secret information. Typically, the device has been stolen or
captured by the adversary who then has physical access to it for launching a
physical side-channel attack. Traditional side-channel attacks involved
differential power analysis and timing analysis. Different amounts of power (or
time) used by the device in performing an encryption can be measured and analysed
to deduce some or all of the key bits. The number of trials needed in a power
or timing side-channel attack could be much less than that needed in
mathematical cryptanalysis [1].

In software side-channel attacks a victim
process inadvertently assumes the role of the sending process, and a listening
(attacker) process assumes the role of the receiving process. If the victim
process is performing an encryption using a secret key, a software side-channel
attack allows the listening process to get information that leads to partial or
full recovery of the key [1].

Sunday, June 21, 2015

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. In this week, we continue the mathematical attacks with the NFS algorithm.

The Number Field Sieve (NFS) is currently the most efficient known factoring algorithm. Its running time depends on the size of the number to be factored but not the size of its factors. NFS based on the idea of factoring by congruent squares: given a large integer $N$, we want to find two integers $x$ and $y$ such that $x^2=y^2 (mod \ N)$. Then hopefully we have $gcd(x-y,N)$ is a non-trivial factor of $N$. We roughly outline how NFS works. The first step of the algorithm is to choose two monic, irreducible polynomials $f_1$ and $f_2$ of small degrees $d_1$ and $d_2$. Let $m \in Z$ be a common root of the two polynomials such that $f_1(m)=f_2(m)=0 (mod \ N)$. Let $\theta_1, \theta_2 \in C$ be two complex roots of $f_1$ and $f_2$ respectively, we construct two algebraic number fields $Z[\theta_i]=Q(\theta_i)$, where $i=1,2$.Actually this gives us two number rings with multiplication defined as polynomial multiplication. Then we define the homomorphisms $\phi_i : \ Z[\theta_i] \rightarrow Z_N$, which maps $\theta_i$ to $m$ (where $i=1,2$). The NFS algorithm aims to find two squares $\gamma_1^2$ and $\gamma_2^2$ from each of the two number rings, such that $\gamma_1^2= \prod_{(a,b) \in S}(a-b\cdot \theta_1)$ and $\gamma_2^2= \prod_{(a,b) \in S}(a-b\cdot \theta_2)$, where $\gamma_1 \in Z[\theta_1]$, $\gamma_2 \in Z[\theta_2]$ and $S$ is a finite set of coprime integer pairs $(a,b)$. In order to find such a set $S$, we will sieve the elements of the form $a-b\cdot \theta_i$ for pairs of $(a,b)$ such that $a-b\cdot \theta_i$ is smooth over some algebraic factorbase. How fast we can find the set $S$ is the key to the efficiency of the algorithm. Next, we need to extract the square root of $\gamma_i^2$ to obtain $\gamma_i$, where $i=1,2$. The methods of Couveignes [1] and Montgomery [2] can be used here. Once the two square roots are calculated, we apply the homomorphisms to have $\phi_1(\gamma_1)^2 = \phi_2(\gamma_2)^2 (mod \ N)$ and expect to have $gcd(N,\phi_1(\gamma_1)-\phi_2(\gamma_2)) \neq 1$ or $N$ is a non-trivial factor of $N$.

[1] Couveignes, Jean-Marc. "Computing a square root for the number field sieve." The development of the number field sieve. Springer Berlin Heidelberg, 1993. 95-102.[2] Montgomery, Peter L. "Square roots of products of algebraic numbers." Mathematics of Computation (1993): 567-571. APA

Physically Unclonable Functions
(PUFs) can be roughly thought as `random' functions accepting a challenge
(typically a sequence of bits) as input, and generating a response (a
different sequence of bits) that is unique for each PUF and for each physical
instance. More precisely, it is a physical device that produces unclonable
challenge-response pairs (CRPs); this means that the input/output behavior of any physical
copy of one PUF will differ from that of the original one due to some
uncontrollable randomness in the copying process.

PUFs are emerging hardware
primitives which can be used for example in key generation applications,
replacing the more conventional non-volatile memory (NVM); thus, instead of
storing the secret key in digital memory, PUFs permit to derive it from the
physical characteristics of the integrated circuits (ICs), reducing consequently the risks of
physical and invasive attacks.

Unfortunately, there are two main issues concerning PUFs, namely the lack
of robustness and unpredictability: in some applications we would like to
obtain the same response every time the corresponding challenge is queried (for
example to enable repeatable key-generations), but often, due to the noise, the
responses are not perfectly reproducible, causing CRPs of type (c,r), (c, r');
moreover, quite likely the response bits are non-uniformly distributed,
especially when the number of CRPs is very large. While fixing the latter
problem is relatively easy, for example using hash functions, obtaining
robustness is more involved.

To overcome both these issues it
is necessary to implement additional post-processing logic. There are
essentially two different solutions: Fuzzy extractors [1], that perform both error correction (using for
example BCH codes) and privacy amplification (applying hash functions), and Pattern Matching Key
Generators (PMKGs) [2].

Delvaux and Verbauwhede in their work describe an attack
to PMGK and also propose a countermeasure to it.

Pattern Matching Key
Generators – Description

At a high level we can say that
this approach reverses the standard challenge-response format of a PUF.

To describe a PMKG we distinguish
an Enrollment phase and a Reconstruction
phase.

Enrollment. Consider a stream (Resp) of PUF
response bits, corresponding to a certain number of challenges, and refer as
a pattern any subset of W consecutive
bits of Resp. If Resp consists
of L+W – 1 bits, then we have L possible patterns.

a. Select one of these patterns
at random (using an external interface) and store the index j corresponding
to it. The actual corresponding response bits (Patt) are published
publicly and form the Public Helper Data (Pub).

Note here that is the index j that is kept secret, and hence used to derive the secret key, and not the
response bits; any index provides log_2(L) bits, assuming L=2^k, for some
positive integer k.

Reconstruction. To recover
the key, the PUF is iterated through a deterministic set of challenges,
obtaining Resp'_i, i=1, ..., H, (Resp'_i
can be seen as Resp_i+Noise_i). Then
perform a patter matching procedure for every round. Note that Resp'_i contains some noise, so
the pattern Patt'_i corresponding to the public Patt_i will be
the the (only) one which satisfies d(Patt_i,Patt'_i)
=t <= T, where T is a fixed and well-chosen threshold value, and d denotes the Hamming distance.

Pattern Matching Key
Generators – Attack

To describe the attack the
authors first model the failures of PMKG. It is very easy to see that there are
two possible failures for key reconstruction: pattern misses and pattern
collisions. The first occur when t > T, and the second occur if t =< T for some index j' not corresponding to the secret
sequence of indices. If we denote by P_MISS and P_COLL the probability of a
pattern miss and collision, respectively, it is possible to prove that:

P_FAIL= 1-
(1-P_MISS)^H(1-P_COLL)^H,

where P_FAIL indicates the
overall failure probability. Intuitively it is clear that pattern misses occur
when T is small, whereas pattern collisions are more probable when T is large.

In a nutshell, the attack
presented in the paper, and named SNAKE due to the similarities with the
well-known video game, exploits malicious modifications of the public helper
string Pub as follows. The idea is to replace the last (to
the right) bit of Pub introducing a random bit in the first position (to the left).
In this way the first unexposed bit immediately to the left of Pub is retrieved via statistical
properties of the overall failure probability P_FAIL. Then it is possible
repeating the same procedure moving along the PUF response string like a snake.
When a consistent change in failure rate occurs, then the secret index j is revealed.

Wednesday, June 10, 2015

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography':
a set of questions compiled to give PhD candidates a sense of what they
should know by the end of their first year. We continue the
mathematical attacks with a description of an index calculus attack...

What is the objective?

An index calculus attack is a method for trying to solve the discrete logarithm problem (DLP).
Very briefly, it works by writing the target value as the product of
powers of elements in a factor base, elements whose logarithm is already
known, then extract the target value through laws of logarithms. We now
proceed to explain what that means in a bit more detail.

How does it work?

The algorithm can be applied to calculating the discrete logarithm for an arbitrary element $h$ any group $G=\langle g \rangle$. We will rely on the fact that if $x^ay^bz^c=1$, then $a*\log_g(x)+b*\log_g(y)+c*\log_g(z)=\log_g(1)=0$. So,
if we can find some collection of $x_i$ who's logarithms are all known
values $L_i=\log_g(x_i)$ and somehow manage to write $h=x_1^{a_1}\dots
x_r^{a_r}$, then we know that $\log_g(h)=a_1*L_1+\dots+a_r*L_r$. The
index calculus attack exploits this, and the efficiency (or
inefficiency) of the attack comes down to how fast the various stages of
this can be done. For
context, alongside the generic technique, we will follow an example in
terms of the discrete logarithm over the group
$\mathbb{Z}/p\mathbb{Z}$ with generator $g$, the most common
application. Being a little lazy, we will use the terms "offline
computation" and "precomputation" interchangeably to refer to work that
need only be done once per group. Similarly "online" and "everytime"
work corresponds to work that must be done for every DLP required.

(Precomputation, basically free) Choose a Factor Base.

The
factor base is a collection of elements ${b_0=g,b_1,\dots,b_r}\in G$.
How to pick them, and how many to pick, are dependant on the group we're
working over and the running times of the later stages.
Indeed, simply choice of $r$ generally leads to a trade-off between
expensive online (small $r$) and offline (large $r$) computation. Working
within our example, one would generally pick $-1$ and the first $r$
primes, since these tend to make the online calculations more efficient
(see below).

(Precomputation, expensive but very parallel) Find relations between the DLPs of the Factor Base elements.

Using whatever techniques we can (generally just taking arbitrary products and hoping to get lucky!)
we find equations in terms of the different factor base elements
relating them to both each other. By taking logs, these translate into
linear relations between their discrete logarithms. We continue
searching for these until we have found $r$ independent relations, which
clearly takes longer the bigger we make $r$. That said, this can easily
be done in parallel by simply asking each process to search
independently and then merging the result sets. Our example works in exactly this way.

(Precomputation, relatively cheap) Solve the Factor Base DLPs

From the previous step, we have a number of linear relationships between the DLs of the factor base elements. In
particular, we have $r+1$ equations in $r+1$ variables (since
$\log_g(g)=1$ is known a priori), and so can solve to find all their
logarithms. Whilst this requires using a large matrix solver, it tends
to be basically free compared to the previous and next steps, since
solving linear equations is much more efficient than the almost
exhaustive nature of searching for relations.

(Online, expensive but very parallel) Write $h$ as a product of factor base elements

We
now try and find a value $y$ and a list $a_i$ such that $h g^y =
b_1^{a_1} \dots b_r^{a_r}$. This can easily be done in parallel, since
each process tries a different collection of $y$ values, stopping as
soon as one of them. Once that's done, we simply take logs across each
value, meaning:

$$\log_g(h) = -y + L_1a_1 + \dots + L_r a_r$$

Now,
I've skimmed over a big issue in that previous paragraph: how do we
find this $y$? Well, in the case of our example its not too bad. Because
the factor base were all small primes, we simply try and factor $hg^y$
using traditional division-like techniques. However, in other groups
this can be very difficult indeed, and computationally impractical.

A very brief conclusionSo,
the Index Calculus algorithm uses the fact that taking logarithms
transforms multiplications into sums to try and find the discrete
logarithm of a particular point. It does this by building up a table of
known results (the factor base), then finding an element related to the
target that can be easily written in terms of these. As such, the
algorithm is very generic, and by changing the size of the factor base
$r$ one recovers a number of obvious classical attacks. However, picking
a value of $r$ such that every stage of the computation can be done
efficiently is generally not possible, since either the precomputation
or online computation (or often both!) will be prohibitively expensive.

Friday, June 5, 2015

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. In this blog post we discuss the Pollard rho, Pollard "kangaroo" and parallel Pollard rho attacks on ECDLP.

Our aim is to solve the discrete logarithm problem, h = gx
for any cyclic finite abelian group G. Thus, assuming that we have a cyclic group G = ⟨g⟩, which has prime order p, we want to find the
value of x modulo p such that h = gx when we were also given an h ∈
G.

The problem
with the Baby-Step/Giant-Step method is that although its run time complexity is O(√p), it also requires O(√p)
space. Hence, we are interested
in replacing the large space requirement for a smaller space requirement, but maintain
a time complexity of O(√p). This task can be achieved with
the following algorithms. [1]

1. Pollard’s
Rho Algorithm.

Let f : S → S be a random
mapping between a set S and itself, n is the size of S. For a random value x0∈ S we compute xi+1
= f(xi) for i ≥ 0. Each step xi+1 = f(xi) is a
deterministic function of the current position xi. The values x0,
x1, x2, . . . are considered as a deterministic random
walk.

Since S is
finite we will eventually obtain xi = xj thus xi+1 =
f(xi) = f(xj) = xj+1. Hence, the
sequence x0, x1, x2, . . . , will eventually
become cyclic (“pho” shape: ρ). Our goal is to find a collision
in a random mapping like the one above, which means to find 2 values xi
and xj with i≠j such
that xi =xj.

To find a
collision we use Floyd’s cycle finding algorithm: Given (x1,x2)
we compute (x2,x4), then (x3,x6)
and so on, i.e. given the pair (xi, x2i) we compute (xi+1,x2i+2)
= (f(xi),f(f(x2i))) and we stop when we find xm
= x2m. It is m=O(√ n).

Assuming that the sequence x0,x1,x2,... is produced by a random mapping from G to itself, then the above algorithm will find the discrete logarithm in the expected time O(√ n).

2)
Pollard’s Kangaroo Method.

Pollard’s Kangaroo
method is like the Rho method but it is particularly tuned to the situation
where we know that the discrete logarithm lies in a certain interval x ∈ [a,...,b].

Let w = b − a be the length of the interval in which the discrete logarithm xis known to
lie. We define a set S = {s0,...,sk−1}of integers in non-decreasing order and its mean m should be around N=√w. We usually choose si = 2i for 0 ≤ i < k (thus the mean of the set is m = $ \frac{2^k}{k}$) and also k ≈ $ \frac{1}{2}$log2(w). The group is divided up to k sets Si, for i = 0, . . . , k − 1. We then define the deterministic
random walk: xi+1=xi·gsj if xi∈Sj.

We compute the deterministic random walk, starting from g0 = gb, by setting gi = gi−1 · gsj for i=1,...,N. We also set c0 =b and ci+1 =ci+sj (mod q). We store gN and notice that we have computed the discrete logarithm of gN with respect to g, which is cN =logg(gN).Now we have to compute the second deterministic random walk starting from the unknown point in the
interval x. We set h0 = h = gx and compute h i+1 = hi · gs′j . We also set d0 = 0 and di+1 = di +s′j (mod q). Notice that we have logg(hi) = x + di.Hence, if the path of the hi meets the path of the gi then hi will carry on the path of the gi. We will then be able to find a value M where hM equals our stored point gN . Thus, we will have
cN = logg(gN) = logg(hM) = x+dM, and the solution to our discrete logarithm problem is given by
x = cN − dM (mod q).If we do not get a collision then we can increase N and continue both walks in a similar manner until a collision does occur. The expected running time of this method is √w and the storage can be seen to be
constant.

3) Parallel
Pollard’s Rho Method.

When we use random walk based techniques for solving discrete logarithm problems we often use a parallel Pollard's version. Assuming that we are given the discrete logarithm problem h = gx in a group G of prime order q, we first decide on an easily computable function H : G → {1 , . . . , k} (k is usually around 20) and then we define a set of multipliers mi. These are produced by generating random integers ai, bi ∈ [0, . . . , q − 1] and then setting
mi=gaihbi.To start the deterministic random walk we randomly pick s0, t0 ∈ [0, . . . , q − 1] and compute
g0 =gs0ht0. The deterministic random walk is then defined on the triples (gi,si,ti) where gi+1 = gi · mH(gi), si+1 = si + aH(gi) (mod q), ti+1 = ti + bH(gi) (mod q). Hence, for every gi we record the values of si and ti such that gi =gsihti.If we assume that we have mprocessors, then each processor can start a different deterministic random walk from
a different starting position using the same algorithm in order to determine the next element in the walk.
When two processors (or the same processor) meet an element of the group that has been seen
before, then we obtain the equation gsi hti = gs′j ht′j from which for the discrete logarithm x can be solved. We expect that after O($\sqrt{πq/2}$/m) iterations of these parallel walks, a collision will be found and the discrete logarithm problem will be solved. However, this means that each processor needs to return every element in its computed deterministic random walk to a central server which then stores all the computed elements. This is highly inefficient due to large storage requirements, namely O($\sqrt{πq/2}$).Moreover the storage can be reduced to any required value as follows: We define a function d on the group, d : G → {0, 1} such that d(g) = 1 around 1/2t of the time. The function d is often defined by returning d(g) = 1
if a certain subset of t of the bits representing g are set to zero for example. The elements in G for
which d(g) = 1 will be called distinguished.

It is only the distinguished group elements which are now transmitted back to the central server, which means that we expect the deterministic random walks to continue another 2t steps
before a collision is detected between two deterministic random walks. Hence, the computing time
now becomes O($\sqrt{πq/2}$/m+2t) and storage becomes O($\sqrt{πq/2}$/2t). Thus, storage can be reduced to any manageable amount, at the expense of a little extra
computation.[1] http://www.cs.bris.ac.uk/~nigel/Crypto_Book/book.ps (pages 208 - 214)