Thursday, November 10, 2016

Published at USENIX 2016 and written by Yupeng Zhang, Jonathan Katz, and Charalampos Papamanthou from University of Maryland this paper proves again how difficult is to get proper security in searchable encryption.

Some of you may wonder why I chose this paper. Reason is that I wanted to get a grasp of how research looks like outside of the MPC area at a top security conference. And what other conference to pick than this year's USENIX?

After I finished reading this paper I realised a cool thing: that you need little to none a priori knowledge about Searchable Encryption (SE) [1]. Fortunately I didn't find myself diving into the literature of SE in order to fully understand the ideas presented there - how many crypto papers have you read and still be able to say this once you're done with them?

Let's introduce first the notion of SE. The setup consists in 2 parties, a client and an encrypted mail server. The client wants to obtain all e-mails which have 'alice' and 'work' as tags. He then computes a token in a deterministic manner, then the server replies with the e-mails corresponding to the query:

To ease the presentation we can assume that instead of e-mails the server will respond with identifiers (urls).

An adversary wins if he breaks the token privacy, namely it recovers keywords from tokens.

A file injection attack is when an attacker has encrypted e-mails of his choice on the server and can see the tokens which are queried by the client. More simple, the server behaves in a honest-but-curious way and stores encrypted e-mails of his choice by spamming the client.

From a first sight this seems harmless. But if you combine it with some leaked e-mails (lots of them these days) the authors managed to have a 70% recovery rate for a keyword from a token with only 20% of leaked files. This was tested using an Enron email dataset of ~30000 emails. [2]

To understand how this works it's important to look at a simpler attack:

Binary search attack.

Suppose an adversary has a keyword universe $K$, with size $|K|$ a power for $2$. He can inject $\log_2{|K|}$ files and then recover keywords from every token with $100\%$ success.

Each file $F_i$, $i \in [0, \log_2{|K|}] $ contains keyword $k_j$ for which $j$ has $i$'th most significative bit equal to $1$. To see why this works let's look at an example inspired from the paper.

In this case we have $|K| = 8$. Black cells are the keywords inserted in file $i$. Since the search result is $F_1$, $F_3$ ($101$) we conclude that the token was derived from the keyword $k_5$.
A server which inserts one email per day can execute an attack in $2$ weeks time for $10,000$ keyword universe.

Countermeasure.

One crucial observation is that we insert $|K|/2$ keywords per file. So one can think that limiting the keywords per file to some threshold $T << |K|/2$ will fix the problem.

The threshold countermeasure turns out to be ineffective. The authors proved this with an attack which works almost like the first one after splitting the keywords in chunks of size $2T$. (denoted as hierarchical search attack).

For $|K| = 5,000$ and $T=200$ an adversary should can inject $131$ files in order to recover keywords for every token.

Attacks with partial leakage.

Sometimes e-mails leak. And when that happens...SE is almost useless against adaptive injection attacks as the authors prove. We say adaptive because it needs a forward insecure SE scheme.

What if we have want to recover keywords from multiple tokens?

The idea is to compute the distribution for each keyword $f^*(k)$ from the leaked files. Then the assumption that $f^*(k)$ is close to the queried tokens distribution $f(t)$ is made - which will turn true latter.
Next, a small candidate set of keywords is chosen (also called ground truth) and files are injected using the binary search attack. The rest of the tokens are recovered by taking joint distributions with previous tokens.

For more attacks and countermeasures which fail I strongly recommend reading the article [3].
At the end of the article one can wonder if building secure SE schemes would really help against these access pattern attacks...The authors suggest that an interesting direction is to look into lower bounds on these attacks but this seems a far from trivial task.

Friday, November 4, 2016

This blog post is the final in a series of three in which we describe what SPDZ is. The first part can be found here, and the second here.

In this part, we consider what it is that makes SPDZ SDPZ.

SPDZ MPC

Using
the term SPDZ is perhaps a little misleading. There are actually a few
protocols which we call SPDZ because they are really just improvements
on the original. As the online phase of SPDZ is already ‘pretty good’,
optimisations to the protocol normally involve improving the offline
phase.

In
what follows, we talk about the methods used in SPDZ for preprocessing
(generating the triples) and then explain some improvements that have
been made.

Original SPDZ

Here we will outline some of the methods in the original SPDZ protocol, as in [5].

Preprocessing

In
the first SPDZ paper, the authors show how to use somewhat homomorphic
encryption (SHE) to generate the triples. This is a (pretty expensive)
public key cryptosystem in which the encryption operation is ring
homomorphic - that is,
\[\textsf{Enc}_{\textsf{pk}}(a)\boxplus\textsf{Enc}_{\textsf{pk}}(b)=\textsf{Enc}_{\textsf{pk}}(a+b)\]
and
\[\textsf{Enc}_{\textsf{pk}}(a)
\boxdot \textsf{Enc}_{\textsf{pk}}(b) = \textsf{Enc}_{\textsf{pk}}(a
\cdot b)\]

(For
those who know about FHE: we can do MPC with FHE, as you’d hope; however, at
the time of writing no FHE scheme which is competitive with current
secret-sharing MPC currently exists. SPDZ is what the authors call the
‘ideal use of FHE in MPC’: raw material is computed locally so that the
online computations are minimal and communication overhead is small.)

During the offline phase, we also generate lots of random values which are used by the parties to provide input to the circuit.

Sacrificing

In
order to check that the adversary has not introduced errors into the
triples, for each triple we want to use we have to ‘sacrifice’ a triple
by taking another triple from the preprocessing and checking that the
third element is the product of the first two using Beaver’s method
outlined in the previous post. Fortunately the method by which triples are generated is
very efficient.

ZKPoPKs

Zero-Knowledge
Proofs of Plaintext Knowledge (ZKPoPKs) are a way of ensuring parties
encrypt as they should: when encrypting in the SHE scheme, there are
conditions on the plaintext and noise that need to be met to ensure the
ciphertext is well-formed.

MACs

Before
any preprocessing computation takes place, the parties agree on some
MAC key $\alpha$ with which they MAC all their data and which they
secret-share amongst themselves so that no individual party knows the
MAC key.

This MAC key is used to MAC all the data in the circuit. For each secret-shared value, there is also a secret-shared MAC on that value.

After
the circuit has been evaluated, the parties open the secret
corresponding to the circuit output and also the MAC on it, and check
that the MAC is correct. If an actively corrupt party cheats at any
point in the circuit
evaluation, the final MAC check reveals this has happened. Note that
this means the parties could be computing on invalid data.

In
the original paper, each party also MACked the global MAC key with a
personal MAC key. This was not needed in subsequent works.

Updated SPDZ: SDPZ2

While
the online phase of SPDZ was (almost) the same, in SPDZ2 [4] the
authors made significant changes to the offline phase of the original
protocol. We outline the major changes here.

They solved an open problem left by the first paper in giving a secure protocol which generated a shared SHE decryption key.

They
also offered various performance enhancements, including the
introduction of ‘squares’ and ‘bits’ in the preprocessing, which allowed
faster online computations of specific operations in the circuit.

A
major improvement was allowing MACs to be checked on data before the
end of the computation. This involved checking the MAC of a public value
that depended on the underlying data. The big advantage of this is that
the preprocessed data MACked under the same key can still be used after
this MAC check, but not in the original protocol since the check
reveals the key.

The
new protocol had covert and malicious variants, the former having
lesser parameter requirements. A covertly secure protocol ensures the
adversary wins with probability at most $1/n$ for some $n \in
\mathbb{N}$. The reason for this was that it was believed more closely
to match the real-world situation.

New SPDZ: MASCOT

The
most recent improvements to the SPDZ protocol is a work known as MASCOT
[3]. As with SPDZ2, the online phase of SPDZ was left as it was; this
paper improves on the offline phase, showing how to use oblivious
transfer (OT) to achieve much faster preprocessed data than by using
SHE. It is authored by Bristol’s own Marcel, Emmanuela and Peter, and
appeared at CCS last week.

(Oblivious
transfer is a process in which one party inputs two strings and a
second party chooses exactly one; this is done in such a way that the
first party doesn’t know which string the second chose, and the second
does not learn anything about the string it didn’t choose.)

The
authors of MASCOT note that the heavy machinery of public-key
cryptography required in the original SPDZ paper to generate the
preprocessing (namely, the use of SHE) prevents the communication from
being a bottleneck since the local computation takes so long.

The
protocol is actively secure and performs significantly faster than
previous SPDZ2 implementations, with much lower communication overhead.

The Future of SPDZ

MASCOT has only just been released, and there are already whispers of a new paper. Watch this space...

Thursday, November 3, 2016

I saw the paper being presented at the IEEE Security and Privacy Symposium, and it was one of the more enjoyable talks due to the nature of the exploit, and the enthusiasm of Erik Bosman presenting it. It also won a 'PWNIE' award for the 'most innovative research on hacking' at Black Hat USA this year.

The idea of the paper is simple - to combine the Memory Deduplication exploit with a Rowhammer attack.

They use this to craft a JavaScript-based attack against the Microsoft Edge browser, to gain arbitrary memory read and write access in the browser.

Memory Deduplication

The idea is to reduce the total memory footprint of a running system by sharing memory pages with identical content across independent processes.

A good example of this is the page cache in modern operating systems. The page cache stores a single cached copy of file system contents in memory, and shares the copy across different processes.

In running processes, two or more pages with the same content in run-time memory are always deduplicated, even if the pages are completed unrelated, and their equivalence is fortuitous.

So, in order to keep a single copy of a number of identical pages, the memory deduplication system needs to perform three tasks:

Detect. The system needs to detect memory pages with the same content, usually done at regular and predetermined intervals during normal system operations

Only Keep One. We only want to keep one physical copy, and return the others to the memory allocator; to do this, the deduplication system updates the page-table entries of the owning processes, so that the virtual addresses now point to a single shared copy; these entries are also marked as read-only to support copy-on-write semantics

Create Private Copy on Write. Specifically, when one of the owning processes writes to the read only page, a copy-on-write page fault occurs; at this point, the memory deduplication system can create a private copy of the page, and map it into the corresponding page table entries of the faulting process

On Windows 8.1 onwards, it's known as memory combining.

Memory Deduplication as a Side Channel

So you may have spotted how you could exploit this as a side channel. Writing to a shared page from any of the owning processes results in a page fault and a subsequent page copy.

Due to these additional expensive operations, a write to a shared page takes significantly longer compared to a write to a regular page (by significantly longer, we're talking up to one order of magnitude). This timing difference provides an attacker with a side channel to detect whether a given page exists in the system.

To do this, the attacker has four easy steps:

Craft a page with the exact same content as the target page

Wait for the memory deduplication system to do its thing

Write to the crafted page

Time how long it takes for the write operation to finish

If the write takes longer than a write to a non-deduplicated page, the attacker can conclude that a page with the same content exists.

So just using this, an attacker could detect whether a user was visiting a particular web page, or running a particular program.

Of course, false positives are possible due to noise etc, so the attacker can use redundancy (repeated attempts) to disclose the intended information in a reliable way.

At the moment, the memory deduplication side channel is looking a little bit pathetic, as it's a slow single-bit side channel that can only be used for leaking a limited number of bits from a victim process.

However, we can abuse memory deduplication to get some stronger primitives.

Primitives

Primitive #1: Alignment Probing

First up, we have a primitive that allows a byte-by-byte probing of secret information by controlling its alignment in memory. So, this primitive is applicable when the secret data has weak alignment properties.

Imagine in this case, the secret is a password stored in memory immediately after a blob of attacker-provided input bytes. What the attacker can do is provide some more input bytes, which effectively push the second part of the secret out of the target page. Now the attacker can brute force the first part of the secret (e.g. one byte) using deduplication.

All the attacker needs to do now is provide fewer input bytes to bring another portion of the secret back into the page. Just like before, the attacker uses deduplication to get the rest of the secret. For larger secrets, you just keep repeating these steps until all of the secret is known.

This alignment probing primitive is apparently very effective, and is used to disclose code pointers in Microsoft Edge, and a password hash in nginx.

Primitive #2: Partial Reuse

When the secret has strong memory alignment properties, the first primitive falls through.

Up next is a primitive that can be used when attacker-controlled input can partially overwrite stale secret data with predictable reuse properties. User-space memory allocators encourage memory reuse and do not normally zero out deallocated buffers for performance reasons.

This means that a target application often reuses the same memory page and selectively overwrites the content of a reused page with new data.

So, same set up as before. This time, when the attacker gives a larger input, it overwrites the first part of the secret. Now the attacker can brute force the second part using memory deduplication.

Once this is known, the attacker can brute force the first part of the secret, by deduplicating against a page without an overwritten secret.

This primitive is apparently fairly common in practice, and is used in the paper to break heap ASLR (Address Space Layout Randomisation) in nginx.

Primitive #3: Birthday Heap Spray

Last one. This primitive can leak a secret even when the attacker has no control over memory alignment or reuse. It relies on a generative approach that revolves around the birthday paradox, which we all know and love.

This primitive is applicable when the attacker can force the application to controllably spray target secrets over memory.
Up until now we have only assumed there is one secret we want to leak.

If a partially masked secret has $P$ possible values, we use memory deduplication to perform $1 \times P$ comparisons between the $P$ probe pages and the single target page - essentially brute forcing the secret.

For large $P$, a very large amount of memory is required, for the attack, and for the redundancy. However, if the attacker can cause the target application to generate many secrets, memory deduplication provides a much stronger primitive than brute forcing.

For instance, an attacker can generate a large number of secret heap pointers by creating a large number of objects from JavaScript, each referencing another object.

So, assume the attacker causes the application to generate $S$ such pages, each with a different secret pointer. The attacker now also creates $P$ probe pages, with $P$ being roughly the same size as $S$.

Each probe page uses the same encoding as the secret pages, except that, not knowing the secret pointers, the attacker needs to 'guess' their values. Each probe page contains a different guessed value. The idea is to find at least one of the probe pages matching any of the secret pages. So we have a classic Birthday Problem, where the secret and probe values are the birthdays.

Since memory deduplication compares any page with any other page in each deduplication pass, it automatically tests all $P$ probe pages against the $S$ target secret pages.

A hit on any of the $P$ possible values immediately exposes a target secret. This birthday primitive reduces the memory requirements of the attack by a factor of $S$. It's especially useful when leaking the location of randomised pointers. For exploitation purposes, it's not important which pointer the attacker hits, as long as at least one of them is hit.

This primitive is used to leak a randomised heap pointer in Microsoft Edge, and also used to craft a reliable Rowhammer exploit.

Rowhammer Exploit

Anyway, now we get to the good bit; the Rowhammer exploit. The Rowhammer bug was first publicly disclosed in June 2014 by Kim et al. However, it was not until March 2015 that someone published a working Linux kernel privilege escalation exploit using Rowhammer.

Essentially, Rowhammer is DRAM vulnerability that allows an attacker to flip bits in a (victim) memory page by repeatedly reading from other (aggressor) memory pages. These bit flips are deterministic, meaning that once a vulnerable memory location is identified, it is possible to reproduce the bit flip patterns by reading again the same set of aggressor pages.

There are two variations of the Rowhammer attack. Single-sided Rowhammer repeatedly activates a single row to corrupt its neighbouring rows' cells. Double-sided Rowhammer targets a single row by repeatedly activating both its neighbours rows. Although double-sided is more effective than single-sided in practice, it requires you to know not only the location of what you want to target, but also the two adjacent memory pages. Therefore, the authors of this paper choose to use the single-sided Rowhammer attack.

So, if you want to use the Rowhammer attack to do something useful, rather than to just corrupt random memory pages, then first you need to find a vulnerable memory location. To do this, you can allocate a large array filled with doubles. If you do it right, you can end up with an encoded double value such that all bits are set to 1.

Then, you find some eviction sets so you can bypass the cache, and you hammer 32 pages at a time (though it doesn't say where they get the number 32 from). By hammer, I mean you read from each page two million times before moving to the next page. Once you've gone through all 32 pages, you check the entire Array for bit flips. After scanning a sufficient number of pages, you will know which bits can be flipped at which offsets.

Next, you want to determine what to place in these vulnerable memory locations to craft the exploit. The idea behind the Rowhammer attack in this paper, is to place some data in the array which, after a bit flip, can yield a reference to a controlled counterfeit object.

So once we find out which bit is flipped, we can store a valid object reference at the vulnerable memory location and pivot to a counterfeit object with Rowhammer.

So that's the attack - using memory deduplication to leak the code and heap pointers, and Rowhammer to pivot them to a counterfeit object. In practice, the paper says it takes about 30 minutes for the two deduplication passes and 508MB of memory. It also takes 1GB of memory to find bit flips, and 32MB to find the cache eviction sets. The timing of the Rowhammer depends on the vulnerable DRAM chips, so they say it varies from seconds to hours in practice.

Mitigation

So finally, the last part of the paper talks about how one would prevent attacks like this. A fairly obvious observation would be to just no deduplicate, as that prevents all exploits like this. However, deduplication contributes heavily to using physical memory efficiently, so we would like to keep it if possible.

This paper suggests that you only deduplicate zero pages. A zero page is a page with a starting address of zero, or rather, the series of memory addresses at the absolute beginning of a computer's address space. The paper doesn't go into much detail about why, but claims they've done experiments to show that it's nearly as efficient as full deduplication, but entirely eliminates the possibility for any attacks targeting memory deduplication.

Fin

So that's the end of paper, they finish by saying memory deduplication is dangerous in the wrong hands, and they've been working with their contacts at Microsoft to find an immediate solution.

I had a look round on the internet but no-ones has published anything further since the paper, so who knows?