Anonymous credentials aim to provide a similar functionality while at the same time not revealing information about the user’s identity when obtaining or showing a credential. In order to construct an anonymous credential system, it is sufficient to exhibit a commitment scheme, a signature scheme, and efficient protocols for (1) proving equality of two committed values; (2) getting a signature on a committed value (without revealing this value to the signer); and (3) proving knowledge of a signature on a committed value.

Chase and Lysyanskaya (in CRYPTO'06) give delegatable anonymous credentials. The functionality of the system can be sketched as follows: each user holds a secret key which she uses to produce multiple pseudonyms Nym. A user A can be known to user O as Nym_{A}^{(O)} and to B as Nym_{A}^{(B)}. Given a credential issued by O for Nym_{A}^{(O)}, A can transform it into a credential for Nym_{A}^{(B)} and show it to B. Moreover A can delegate the credential to user C, known to A as Nym_{C}^{(A)}. C can then show a credential from O for Nym_{C}^{(D)} to user D (without revealing neither Nym_{A}^{(C)} nor Nym_{C}^{(A)} ), or redelegate it, and so on.

Jan Camenisch and Thomas Groß (in CCS'08) extend the Camenisch-Lysyanskaya credential system (in SCN'02) with a finite-set encoding. It enables the efficient selective disclosure of binary and discrete-values attributes which are partially highly privacy-sensitive and require a selective disclosure of one attribute while hiding others completely. This method overcomes the severe limitations of existing schemes. We require a solution with two key properties: (1) It only uses at most one attribute base for all binary and finite-set attributes. (2) It only impacts the proof complexity by the number of used attributes instead of the total number.

The major building block of Camenisch-Groß anonymous credential system is Camenisch-Lysyanskaya signatures (in SCN'02). The CL Signatures are as follows:CL Signatures

Let and L_m, L_e, L_n, L_r andL be system parameters. is a security parameter.

Efficiently proving that a credential contains an attribute with a given value

The user can convince the verifier that E encodes a given attribute, e.g, she can prove that her identity card states that her hair color is blond. Assume that the attribute hair color blond is
encoded by the prime e_j . Thus to convince the verifier that she got issues a credential with this attribute, i.e., that e_j divides E included in her credential, the user engages with the following proof with the verifier:

Showing that an attribute is not contained in E, i.e., how to prove a NOT-relation

Proving that a given e_j is not contained in her credential, the user can do so by showing that there exist two integers a and b such that aE+be_j=1 . Note that a and b do not exist if e_j|E . The protocol is as follows: After having computed a and b, the user chooses a sufficiently large random r (about 80 bits larger n) and computes a commitment D=g^{E}h^{r} mod n. The user sends D to the verifier and runs the following protocol
with him (where a and b are the secret denoted by α and β, respectively). Finally, the user engages with the verifier in the proof:

A 'collision' in cryptanalysis traditionally denotes the noninjective mapping of different inputs to the same output of a cryptographic function. Side-channel collision analysis seeks to exploit collisions in the intermediate values of a function, without requiring that these propagate to the final output. Since cryptographic algorithms have, until recently, been designed to be secure in a block box setting -- i.e. assuming that the attacker cannot 'see' the intermediate values -- such internal collisions can occur quite frequently. Whilst it is true that they are harder to detect than collisions in the output, side-channel methodologies provide the possibility for finding them, so long as sufficient physical trace measurements can be acquired.

To find an internal collision in a block cipher, for example, an attacker might encrypt two plaintexts (under an unknown key) with a small, fixed differential, measuring the power consumption as an intermediate value (for example, an S-Box output) is computed. If the correlation between the two measurement vectors is high, this may indicate a collision. The plaintext differential associated with the collision can be looked up in a table which will give the possible input pairs producing such an output, consequently narrowing down the possible key candidates. Of course, many traces may need to be collected for many different plaintexts before sufficient collisions can be found to deduce the key, and several replicates per plaintext may need to be collected and averaged in order to diminish the confounding effect of noise. Moreover, they are 'chosen message' attacks as opposed to 'known message' attacks, implying a stronger adversary than many typical side-channel strategies. But one significant advantage is that, unlike DPA (say), which relies on a good 'power model' for the device leakage, the attacker does not need to know anything about the form of the device leakage as trace measurements are directly compared with each other rather than with hypothesis-dependent predictions.

The first paper focuses on the DES block cipher, which naturally lends itself to such an approach because of its noninjective (6-4) S-Boxes. However, as the authors explain, the key expansion step makes it impossible to force a collision in the output of any single S-Box within the round function. Collisions in three adjacent S-Boxes can be produced, which means the tested input pairs have a length of 18 bits and the differential tables must be built according to the concatenation of three S-Box differentials. The authors tested their strategy against simulated device leakage and measurements taken from a 8051 compatible microcontroller running a software implementation of DES. The minimum average number of traces required to cause a collision in S-Box triple 2,3,4 was 140, yielding 10.2 out of the 18 targeted key bits. For S-Box triple 7,8,1 the average number to cause a collision was 165, yielding 13.8 key bits.

The second paper tackles the slightly more difficult problem of detecting and exploiting internal collisions in AES -- which rather has an injective S-Box. Fortunately (from an attacker's perspective) it turns out that key-dependent collisions can occur in one of the output bytes of the MixColumns transformation. They show that collisions are produced within an average of only 20 measurements, and propose various ways of reducing trace and storage complexity, finally proving the effectiveness of their strategy in a simulated attack averaged over 10,000 random keys.

The third paper suggests improvements to the efficiency and reliability of the collision detection step, via a strategy which is (superficially) similar to correlation DPA: the key differential associated with a particular input pair is hypothesised and the power consumption of the resulting S-Box outputs are correlated with one another. This is repeated over all possible key differentials (and all time points), and the hypothesis producing the highest correlation is taken to indicate the correct differential (and collision time). Using this method they are even able to extend the attack against AES to a masked scenario, demonstrating (against expectation) that collisions can be found and exploited in a protected implementation.

Some of the points raised for discussion following the presentation of the papers included the applicability of the techniques to other collision detection methods (e.g. cache-based) and the likelihood and impact of false positives in the detection stage.

Saturday, October 20, 2012

This week I attended CCS in Raleigh, North Carolina. There were lots of interesting papers with a total of 81 being presented in all.

The conference began on Tuesday with the keynote talk from Virgil Gligor. In his talk he discussed how a general theory of trust should be built upon both behavioural trust and computational trust. Perhaps my favourite session of the day was that on TLS. The second talk of the session "Why Eve and Mallory Love Android: An Analysis of Android SSL (In)Security" presented a man in the middle attack on SSL as it is used by Android phones. The speaker told us that that 41 of the 100 apps they study were vulnerable allowing them to determine bank account information, Facebook credentials and much more. One attack presented during the talk showed how the authors were able to get a popular anti-virus app to accept a virus signature the authors had created for the anti-virus software being used. As a result the app would be detected as a virus and delete itself. The day ended with a buffet dinner and a concert from local bluegrass band the Steep Canyon Rangers.

Smart meters are a growing area of research in the security community. On Wednesday there were two papers on this subject. The second of these "Neighborhood Watch: Security and Privacy Analysis of Automatic Meter Reading Systems" looks at AMR meters that have been deployed for the past decade. While these are not actually smart meters they do use wireless technology to allow remote collection of meter readings. The authors were easily able to reverse engineer the protocol used and as a result were able to determine power useage within homes. Further to this it is possible to perform spoofing attacks enabling an attacker to reduce or increase a victim's utility bills. Currently there are no cryptographic security mechanisms used on these systems, therefore a simple fix would be to introduce a scheme to enable confidentiality and integrity of meter readings. In one of the final sessions on Wednesday Robert Lychev presented the paper "Provable Security of S-BGP and other Path Vector Protocols: Model, Analysis and Extensions" based on work with Alexandra Boldyreva. BGP is the standard for routing traffic across the Internet. S-BGP is an extension to this to ensure secure routing using a PKI. The paper presented is the first attempt to provide a formal provable security analysis of the protocol. In the evening there was the traditional poster session.

Thursday, October 18, 2012

This study group was chaired by
Theo Tryfonas and Theo Spyridopoulos and focused on applications of Game
Theory in Internet Security. The discussion was based on the work
presented in the following two publications:

In the first paper, introduced by Theo Tryfonas, the authors
present a survey and taxonomy of existing game-theoretic approaches
designed to enhance network security. Current research into network
security only considers non-cooperative games. Since they are one-shot,
static non-cooperative games are by definition of imperfect information
(at least one player is not aware of at least one other player's
previous moves). Conversely, dynamic games can be of either perfect or
imperfect information. Both game categories can be either complete
(every player knows all other players' strategies and payoffs) or
incomplete. After briefly discussing the characteristics of each game
type, they classify many existing game-theoretic network security
approaches into one of the investigated game categories.

During
the second part of the discussion, we took a dive into the details of
how game-theory can be employed to protect a network against DoS and DDoS
attacks and more specifically on attacks aiming to deplete the
network's bandwidth. The interaction between the malicious user and the
network administrator is modeled as a non-zero-sum game between two
players.

In this work, the authors investigate two scenarios:

The attacker controls a single node (DoS)

The attacker controls multiple nodes (DDoS)

The only countermeasure available to the defending player is a firewall. There is a pipe of limited bandwidth B connecting the firewall with the service under attack. If T is the rate of the traffic crossing the pipe, the attack is considered successful if T > B, resulting in some legitimate traffic getting dropped.

In
both situations, the attacker tries to choose the optimal number of
attacking nodes (in the DDoS scenario) and to optimise each node's
traffic transmission rate. The defender aims to configure the firewall
in a way that will block as much malicious traffic as possible, while
dropping as little legitimate traffic as possible.

The
authors first investigate a static, one-shot game, whereby the two
players decide on their strategy before the game starts and they are not
allowed to change it afterwards. Subsequently, the authors discuss a
dynamic game, whereby players are allowed to adjust their strategies as
the game progresses.

The scenarios adopt the following metrics:

Legitimate User Traffic: When not under attack (na: no attack), the network has traffic from n legitimate users with total rate Tna where Xi is the traffic sending rate of user i.

Tna = X1 + X2 + ... + Xn

If all Xi follow a Normal Distribution, then Tna will also follow a Normal Distribution. Additionally, the model makes the assumption that under normal operation, Tna < B with high probability.

The attacking player controls the number of attacking nodes (m) and each node's traffic sending rate rA (common for all nodes). The DoS scenario is a simplified version of the DDoS scenario, where m=1.
When the network is under attack, the traffic rate crossing the pipe is T:

T = X1 + X2 + ... + Xn + m * rA

The
firewall drops packets of a flow with a probability which is dependent
on the flow's rate. This probability is modeled by a sigmoid function.
As a result, the firewall drops both malicious as well as legitimate
packets. The defender controls the value of parameter M, which represents the flow rate which will be dropped with probability 0.5.

The attacker's aim is to increase va and vn while maintaining a low incurred cost vc.

va: ratio of bandwidth consumed by the attacker to bandwidth consumed by legitimate traffic

vn: lost users to total number of users ratio

vc: attacker's cost, proportional to the number of nodes used for the attack

After applying weights to each of those factors, the total payoff for the attacker is:

Va = wba * vb + wna * vn - wca * vc

The payoff for the defender is:

Vb = - wbb * vb - wnb * vn + wcb * vc

As
an example, the authors perform an analysis of the zero-sum version of
this game, whereby the attacker's and defender's weights are the same:

Va = -Vdwba = wbb = wb, wna = wnb = wn, wca = wcb = wc

The authors conclude analytically that the static game has a Nash equilibrium. Further investigation via simulations focuses on vb, the first component of player payoff. Results demonstrate that, in order to bypass the firewall, the attacker can increase m while reducing rA. They also demonstrate that there exists an optimal strategy for both players.

The
dynamic game is a sequence of time steps, with both the attacker and
defender having a payoff at each step. Each player gets an opportunity
to change strategy at each time step. The total payoff for each player
is the sum of those individual payoffs. Other than pointing out its
differences with the static game, this paper does not investigate the
dynamic game any further.

Wednesday, October 10, 2012

The by far best talk on Embedded Systems Security so far was given today by Nikil Dutt on "LRCG: Latch-based Random Clock-Gating for Preventing Power Analysis Side-Channel Attacks" which is joint work with K. Tanimura. The authors propose a time randomization against power analysis attacks which can be applied to ASIC implementations and have evaluated their proposal with simulated traces.

The main idea is to use latches (L) instead of Flip-Flops (FF):

traditional model:

in -> FF -> Circuit -> loop to FF /out

new model:

in -> LA -> Circuit1 -> LB -> Circuit2 -> loop to LA / out

where Circuit1 and Circuit2 together implement the same functionality as Circuit (i.e. one round of AES). The latches are not triggered by the clock (clk) signal directly but by

LA: not(clk XOR rand)

LB: clk XOR rand

where rand is a random bit which stays constant during the entire AES computation and is updated only with each new plaintext. Most imprtantly, this can be applied to slices of the circuit, e.g. with four slices corresponding to one column of the MixColumn operation. Thus they can start computation either in Circuit1 or Circuit2 depending on rand with different rand for each slice. Additionally, the authors modified the Synopsis Design Compiler for greater ease of implementation.

Using traces obtained from Synopsis NanoSim (post synthesis) and Synopsis' standard library SAED_EDK 90nm the authors generated overhead estimations and power consumption traces for four different implementations: an unprotected baseline, an implementation using this countermeasure and two comparison implementations using masking and WDDL respectively. The overhead estimations show that this is, both in area and energy consumption, a very efficient countermeasure.

They used the simulated power traces to run simple DPA and CPA attacks against all four implementations. Unlike the three comparison implementations, they failed to break their proposed countermeasure using up to 8192 traces. (The somewhat random number of traces was motivated by the considerable amount of time needed to generate the simulated traces.) On the one hand, these results are promising. On the other hand, power analysis results obtained from simulated traces are inherently unreliable but alas, one can not always tape-out a new chip. Also, the attacks do not use any of the fairly well established trace-alignment techniques; trace-alignment may very well defeat this countermeasure. To conclude, I believe this is an interesting idea deserving further research and evaluation.

There were no talks directly related to the security of embedded systems on Tuesday so this entry will be about something different: "Performance Enhancement under Power Constraints using Heterogeneous CMOS-TFET Multicores" by E. Kultursay, K. Swaminathan, V. Saripalli, V. Narayanan, M. T. Kandemir and S. Datta, all from Pennsylvania State University.

Within battery powered embedded systems, you usually have strict limits on your power budget - battery development has, unfortunately, not been able to keep up with the development of computing power. Computing power has been driven mainly by three factors: Smaller transistors, higher frequencies and more cores. But it wasn't possible to scale all aspects transistors comparable to their size and frequency behaviour. Most notably, some characteristic voltage levels like the threshold voltage Vth have remained fairly constant for CMOS devices over time. This means, that the power per transistor (processor core) grows with the frequency.

To reconcile the more power-hungry multi-core devices and the limited power budget, two strategies have been developed: Dim Silicon and Dark Silicon. The Dim Silicon strategy means that many or all cores get only a small power budget allocated, forcing them to operate at low frequency in a sub-threshold setting but enabling (almost) all cores to operate simultaneously. This is great for applications which can be easily parallelized. The Dark Silicon strategy on the other hand allocates large power budgets to a few cores which can now operate at a high frequency while all the other cores are turned off completely. This is great for applications which do not profit from parallelism but work predominantly sequential. The Dark Silicon strategy is well suited for CMOS transistors (e.g. FinFET) who are fairly power efficient at high frequencies but have a fairly poor Ion/Ioff characteristic in low-frequency sub-threshold settings. The Dim Silicon strategy on the other hand is fairly well suited for TFET transistors which use the quantum tunneling effect as they are fairly power efficient in low-frequency sub-threshold settings (but not at high-frequency with "full" voltage VCC>>Vth).

This was all background to the work described in the paper. As it is fairly difficult to predict what kind of applications will be run predominantly on a generic multi-core processor for embedded systems - you want to sell your chip to everyone, not just to some people - the authors started evaluating the possible ways to combine Dim Silicone cores implemented using TFETs and Dark Silicone cores implemented using CMOS FinFETs on the same die and, with extensive simulations, discover optimal partioning of processors, power budget and work allocation. They based their simulation on 32 Intel Atom cores of the Z5xx series and used 20nm transistor characteristics. Their first result is that the best partitioning of processors can be achieved with 24 TFET low-power cores (Dim Silicon) and 8 CMOS high-power (Dark Silicon) cores. The second result is that a dynamic allocation of power budgets and workloads resulted in the best trade-off between power consumption and performance.

You might think: "D'OH, that's sort of obvious, isn't it?" Well, no, it isn't. It does meet expectations but multi-core systems raise surprisingly complex issues and if you start mixing different transistor types for different cores, switching between different voltage and frequency levels, trade-offs don't always show the expected result. This paper has given an analysis as comprehensive as possible and will provide an important stepping stone for additional explorations. For example, one would not expect the same processor core to be used over and over again but would expect to see at least some specialized cores for multimedia applications, cryptography and communication, some of them with hard real-time constraints. These will complicate the allocation schemes further, having different power consumption characteristics.

I believe, this paper highlights some of the complexities waiting for Embedded Systems Designers (including those working on Embedded Systems security) very well and shows some very interesting directions to move forward.

Tuesday, October 9, 2012

Let me start with a short introduction to Embedded Systems Week: It consists of three co-located conferences, CASES, EMSOFT and CODES+ISSS and several workshops which all cover different aspects of embedded systems, bringing together experts on (almost all) aspects of embedded systems. So far, ESWeek did have one major drawback: Security of Embedded Systems was a complete non-issue except at the (rather low-key) WESS workshop. (If you don't believe me, just look at my blog posts from last year's ESWeek: [ESWeek'11 day 1], [ESWeek'11, day 2] and [ESWeek'11 day 3].) So I was very positively surprised that CASES had a session on security of embedded systems today.

Because of that, I decided to blog about the best talk of that session: "Static Secure Page Allocation for Light-Weight Dynamic Information Flow Tracking" by Yunsi Fei (the speaker), Juan Carlos Martínez Santos and Zhijie Jerry Shi. In their paper, they present an efficient variant of separating data in memory into "trusted" and "untrusted" categories in order to protect against buffer overflows occurring from malicious user entries. At compile time, all data is categorized either as trusted (all data which does not originate from a user/external input) or as untrusted (all data which originates from user input). To reduce the overhead from marking each individual variable as trusted/untrusted, they enforce separate memory pages for trusted and untrusted data. Most of the runtime checking can be handled by dedicated hardware support.

While this certainly does not cover all potential problems on a normal Computer or on smartphones - most notably the problem of code generated buffer overflows isn't covered - we do have to keep in mind that the world of embedded systems is extremely large, ranging from very simple RFID tags over light-weight Wireless Sensor Network Nodes to systems as complex as modern smartphones and game consoles. Depending on the system abilities and intended use-cases, the security requirements vary greatly and especially for the light-weight systems malicious code may not be a problem at all. Instead, a tailored security solution that covers the specific security requirements of this particular system as efficiently as possible is required and this work will be an interesting addition to the existing solutions - valuable for some (but certainly not all) embedded systems.