Theory (MPC)

Last Updated: 16 December 2018

In this crypto bite, we'll look at the theory behind secure multiparty computation (MPC), also called secure function evaluation (SFE). After showing real-world applications of MPC, we'll define some useful security requirements that we expect of SFE protocols. We'll also define various threat models. To prove the security of an SFE protocol, it is not sufficiant to check whether the security requirements are met. Instead, we'll show a very general definition for the security of SFE protocols which is not dependent on an arbitrary list of security requirements using the ideal- vs. real-world paradigm. Finally, we'll show how to prove security in the semi-honest and in the malicious setting. Mastering these concepts is important since they are required in understanding the capabilities as well as the limitations of SFE/MPC. Learning them on their own is also rewarding, as most definitions are not specific to MPC, but are used nearly everywhere in cryptology. Everything in this crypto bite is based upon [LP08, L15].

Introduction

Put simply, in secure multiparty computation, also called secure function evaluation, multiple parties \(P_1, P_2, \dots, P_n\) jointly evalute a function \(y = f(x_1, x_2, \dots, x_n)\) in such a way that each party \(P_i\) contributes input \(x_i\), but with the privacy property[L17] which states that it knows nothing about the other parties inputs \(x_j, j \ne i\), except what can be inferred from the output \(y\).

This is not the most general definition yet, but is good enough to get started. We'll see below how to generalize it to functionals.

SFE in Real Life

SFE is an incredibly general cryptographic construction. In fact, many cryptographic protocols can be cast as an SFE problem. For example, when both Alice and Bob compute a shared key using the Diffie-Hellman key exchange protocol, they SFE-compute a function \(k_{ab} = f(a, b) = g^{ab}\) of both Alice's secret key \(a\) and Bob's secret key \(b\), yet without Alice knowing \(b\) nor Bob knowing \(a\). In fact, your browser is only able to show this page, because it ran some form of key exchange with the web server, and thus ran an instance of SFE computation! Many other instances of key exchange are an SFE problem at heart.

But the application of SFE isn't limited to cryptographic protocols alone. In real life, it often occurs that people or institutions who don't trust each others or who, by law, aren't allowed to share their data with others, still want or need to securely compute some function. The following scenarios are barely scratching the surface.

Secure Dating: Alice and Bob met for a first date, and are about to decide whether or not they want to meet for a second date. In order not to hurt each others feelings, they don't want to tell their answer directly, yet they somehow manage to resolve this conundrum by SFE-computing the AND function \(y = f(x_a, x_b) = x_a \wedge x_b\).

The Sum of Our Earnings: How much do all people in a room make? For obvious reasons, no one wants to disclose his or her salary to the group, but they still want to know the sum of all their earnings by SFE-computing the sum \(y = f(x_1, x_2, \dots, x_n) = \sum_{i=1}^n x_i\). If all parties are honest, there's an easy solution to this problem. In the presence of gossipy parties, or even malicious parties, it gets more difficult, but it is still feasible under some conditions.

Elections: The goal of an election is to SFE-compute the function \(y = \operatorname{majority}(x_1, x_2, \dots, x_n)\), where \(x_i\) is the vote cast by voter \(P_i\) for some candidate. Unless votes are public, we don't want the individual voter's vote to be disclosed, yet still, we want the electronic voting system to correctly compute the right winner \(y\) of the election.

Private Auctions: In a sealed bid auction, \(n\) bidders \(P_i\) secretly place their bids \(x_i\), and at the end, only the highest bid \(y = \operatorname{max}_{i=1}^n(x_1, x_2, \dots, x_n)\) (and optionally the bidder \(P_i\)) is disclosed by SFE-computing the maximum function \(\operatorname{max}\). Variations like Vickrey Auctions can be computed the same way.

Private Database Search: Did you ever feel queasy googling for some embarrasing key words, while being logged in as a Google user? Of course you did: database queries can reveal a lot about the people asking them. Using cryptographic techniques known as private information retrieval (PIR) [G04], one can query a database without telling it what is actually being queried. In SFE terms, we want to securely compute the function \(y = \operatorname{query}_x()\) where \(x\) is e.g. the SQL query string. Arguably, stated this way, while this is clearly an SFE problem (we want the database to execute the equivalent of the query \(x\) without knowing \(x\)), it may or may not be considered an MPC problem. Nevertheless it beautifully illustrates the SFE idea.

Privacy-preserving Data Mining: As a generalization of PIR, multiple parties want to compute some result by collectively mining each others private databases. However, due to laws and regulations, these parties aren't allowed to share their databases. Think of hospitals keeping databases of their patients medical records, yet still wanting to do some epidemiologic research by data mining all those databases. Using MPC, they can SFE-compute a function \(y = \operatorname{query_{k_1, k_2, \dots, k_m}}(x_1, x_2, \dots, x_n)\), where \(x_i\) is the database held by hospital \(P_i\), and \(k_j\) is some key word used in the query, like, say, "diabetes", "cause of death", and so on.

Private Set Intersection: Suppose that multiple intelligence agencies like CIA, MI5, Mossad want to identify a terrorist. Each agency holds a list of potential suspects. By combining these lists, they could narrow down the search by finding out whose names appear in all lists (arguably, those individuals are likely to have risen many red flags and are less likely to be false positives). Unfortunately, they can't simply send their lists to their partner agencies for fear of them being infiltrated by moles. Using MPC, they still can SFE-evaluate the set intersection \(y = f(x_1, x_2, \dots, x_n) = \bigcap_{i=1}^n x_i\), where \(x_i\) is agency \(P_i\)'s set of suspected terrorists.

Security Requirements

SFE is usually implemented by the parties running some sort of protocol. So when is an SFE protocol secure? To answer this question, we need to precisely define what it means for an SFE protocol to be secure. One possible approach is to identify a set of properties (security requirements) that we expect from a secure SFE protocol[CH17].

Privacy: only the output is learned, and nothing else.

Correctness: parties obtain the correct output, even if some parties misbehave.

Independence of Inputs: parties cannot chose their inputs as a function of other parties' inputs.

Fairness: if one party learns the output, then all parties learn the output.

Guaranteed Output Delivery: all honest parties learn the output.

As an example, consider the sealed bid auction, and an external adversary \(\mathcal{A}\) who wants to break the SFE auction protocol. How would the security requirements influence \(\mathcal{A}\)'s chances?

Privacy: \(\mathcal{A}\) learns an upper bound on all bids, and nothing else.

Correctness: \(\mathcal{A}\) can't win by placing a lower bid than the highest.

Independence of Inputs: \(\mathcal{A}\) can't bid one dollar more than the highest (honest) bidder.

Fairness: \(\mathcal{A}\) can't abort the auction if his bid isn't the highest (i.e. after learning the result).

Guaranteed Output Delivery: \(\mathcal{A}\) can't abort the auction (this is stronger than fairness since it doesn't require knowledge of the output nor completion of the protocol: it means that no denial of service attacks are possible at any point in time during execution of the protocol).

Exercise: What are \(\mathcal{A}\)'s chances with respect to these security requirements in the other application scenarios we sketched above (electronic voting, secure set intersection, ...)?

A set of security requirements is very convenient: all it would take to prove the security of an SFE protocol would be to check one requirement at a time, which is not so hard.

Unfortunately, there are huge drawbacks to this method:

how do we know that all concerns are covered? Since we don't know which strategy an adversary may employ, we can never be sure that we covered all our bases. What's even worse: our SFE protocol could later be used in settings and applications we never envisioned. Are the above security requirements sufficient in these new settings?

the definitions are application dependent, and need to be carefully reformulated for every new application. It is incredibly easy to inadvertently come up with bad definitions, which look good superficially and intuitively, yet fail to capture the real nature of the application, with all its peculiarities. We would really like to minimize the amount of hard definitional work that porting an SFE protocol to a new application requires.

For both reasons, cryptologists prefer to prove the security of an SFE protocol in the general case, using the ideal world vs. real world paradigm. But before we come to that, we need to talk a little about threat models and to classify adversaries. We'll also need to talk about different kinds of security. So let's get started.

Threat Models

Allowed Adversarial Behavior

We distinguish three types of the parties' behavior, depending on what the adversary can do with the parties under her control. The parties can be:

honest: they follow the protocol, and they are not under the adversary's control. Their transcript remains secret, i.e. hidden from the adversary's eavesdropping eye. Furthermore, their internal state (instruction and data memory) remain hidden from the adversary as well.

semi-honest: they too follow the protocol, but they are under the adversary's control. The adversary doesn't change the algorithm they compute nor does it fiddle with the data, i.e. internal state, but is interested in learning their transcript. In other words: semi-honest parties instruction and data memory is read-only to the attacker. Semi-honest parties are also known by the name curious-but-honest parties.

malicious: they don't necessarily follow the protocol. Instead they behave in totally different ways as instructed to by the adversary. Furthermore, the adversary has access to their transcripts and internal state. In other words: the adversary has read-write access to the (instruction and data) memory of the malicious party, and can basically instruct it to deviate from the protocol in any arbitrary way he deems necessary. Malicious parties are sometimes called Byzantine parties.

But what are Transcripts?

We've talked about transcripts above. But what are transcripts exactly? A transcript of a party, say, Alice, is an ordered list of all her communications with all other parties during the course of the protocol's execution. For example, if the protocol consists of the following exchange of messages:

Note in particular, that transcripts allow us to "peek inside secure channels". Indeed, even though Alice would be sending and receiving messages via secure links (in the information-theoretic sense when in the ideal world, in the computational-security sense in the real world) to Bob and Carol, knowledge of Alice's transcript effectively makes the links between Alice and Bob, and between Alice and Carol "transparent". An attacker knowing Alice's transcript WILL get to know \(g^a, g^b, \dots\), even though they were sent over secure channels. Of course, the attacker still won't be able to peek inside the secure channel between Bob and Carol, unless he happens to also have access to Bob's or to Carol's transcript (which he cannot, unless he controls one of Bob or Carol too)

Corruption Strategies

The adversary may build the set of parties unter its control in the following ways:

static adversary: the set of corrupted parties under the control of the adversary is known in advance, i.e. before the SFE protocol starts, and is static: parties that are corrupt at the beginning stay corrupt the whole time. Parties that are honest at the beginning, stay honest the whole time. This is a relatively easy scenario to reason about. We may not assume knowledge of the exact set of corrupted parties in security proofs, but we'll always assume that this set is static, i.e. doesn't change during the run of the SFE protocol.

adaptive adversary: the set of corrupted parties grows, depending upon the adversary's decisions and inputs. Parties that are corrupt stay corrupt. Parties that are honest, may or may not become corrupt at some point during the run of the SFE protocol; but if they do, they stay corrupt. This situation is much more difficult to reason about, because the set changes dynamically during the protocol's execution. Even though it is hard to work with this assumption, it is nonetheless necessary in many real-world applications, since it better captures how hackers penetrate servers, then use the knowledge collected there to jump into additional servers, adding them to the set of corrupt servers they control.

adaptive, but cautious (covert) adversary: in some settings, the adversary may not want te be caught, and will behave in a risk-averse manner, e.g. by reducing its amount of communications and computations on the corrupted parties to avoid triggering an IDS (intrusion detection system). Here, parties that are corrupt, may or may not stay corrupt and could also become honest again. Honest parties may or may not become corrupt. Put differently, the set of corrupted parties grows and shrinks dynamically. In the real world, there could be some cost associated with being caught, e.g. a company participating in an MPC protocol could be thrown out of the consortium for cheating. We will not cover cautious adversaries below since it is a very difficult type of attacker to reason about, but keep them in mind. In any case, the one extreme at the low end of the scale of adversary's "power" is a static adversary, the other extreme at the high end of the scale is an adaptive adversary. Cautious adversaries are somewhere in the middle of that scale: how far low or high they are depend on the risk factor they are willing to incur for controlling additional parties of for relinquising control over previously corrupted parties.

Computational Powers of Adversaries

Depending of the computational power that we're willing to concede to the adversary, we have the following kinds of adversaries:

probabilistic polynomial-time (PPT) adversaries: they can compute functions in polynomial time, where the polynomial is usually taken over the length of the input. The execution time is therefore \(O(p(|x|))\), with \(p\) a polynomial of some arbitrary degree. Furthermore, a PPT adversary may "toss coins", i.e. have probabilistic output. Only languages in the complexity class \(\mathcal{BPP}\) (bounded-probability polynomial time) can be recognized by an PPT adversary. This limits the functions that PPT adversaries can compute.

computationally unbounded adversaries: they have basically unlimited computational power. They can e.g. recognize any language in the complexity class \(\mathcal{NP}\), even for languages \(L \notin \mathcal{BPP} \subset \mathcal{NP}\). They are purely theoretical, and can't be implemented in the physical world. Yet, they are useful "absurdly strong" adversaries to test the security of our constructions against.

The thesis is that efficient computations correspond to computations that can be carried out by probabilistic polynomial-time Turing machines[FOC1, 1.3.2.3, page 15].

There is also a relation between the powers of the adversary and the kind of security they represent. To protect against computationally unbounded adversaries, we need to prove that a construction or protocol are secure in the information theoretic model. To be secure against PPT adversaries, it is sufficient to prove security in the much weaker computational model. E.g. the BGW protocol can be proven to be secure in the information theoretic sense (when \(t < n/3\)), while the GMW protocol is only secure in the computational security sense, because in makes use of oblivious transfer (OT) which is implemented using cryptographic assumptions which themselves are merely secure in the computational model.

Different Executing Settings for SFE protocols

We distinguish the following execution settings:

stand-alone: the computing parties run on, and only one instance of an SFE protocol from the beginning to the end. They don't get disturbed by other duties from the instant the protocol starts up to the instant the protocol ends. No foreign computations and parties can influence the outcome of the protocol. When we reason about the security of MPC, this is the setting we usually assume, as it is much easier to reason about stand-alone executions.

concurrent general composition: the computing parties execute multiple instances of an SFE protocol concurrently. Imagine e.g. a cluster of compute servers which run some multitasking operating system like Unix. Each compute server would run a process per protocol instance, therefore multiple instances of the protocol run concurrently. One may think that if those instances are well isolated from each others, the security of each protocol instance would be the same as that of a stand-alone execution. Unfortunatly, this is not (always) the case[citation needed]. Therefore, reasoning about concurrent execution isn't easy, despite this setting being prevalent in practice.

Types of Security

We define three types of security, from weakest to strongest: computational, statistical, and perfect. In the context of encryption, we would thus (somewhat informally) define:

computational security: a probabilistic polynomial time (PPT) distinguisher can't tell the difference between e.g. a set of ciphertexts and a set of random garbage with more than negligible probability.

statistical security: same as computational security, but now, we don't require than the distinguisher be PPT. It can be computationally unbounded. We say that the set of ciphertexts and the set of random strings are statistically close.

perfect security: same as statistical security, but additionally we require that the negligible probability be zero. In othe words, the set of ciphertexts and the set of random strings are identically distributed.

Security in the SFE context will have distinguishers trying to tell the difference of what happens in the real world vs. what happens in the ideal world. Both worlds are introduced in the next section. The exact definitions of computational, statistical, and perfect security of an SFE protocol will furthermore depend upon the threat model. In any case, security can then be proven using the widely popular simulation method[L18].

The Ideal- vs. Real-World Paradigm

The idea of the ideal- vs. real-world paradigm may seem weird at first, but it is a basic method to prove security of almost any cryptographic protocol involving communications. In a nutshell, we first assume that the protocol is perfectly executed in an ideal world which contains an incorruptible trusted third party which runs the protocol perfectly, and we'll posit that if that protocol is secure (in a way to be defined below), then any attacker who manages to corrupt some parties in the real world won't be able to learn more information than an attacker in the ideal world (which we'll call a simulator) could learn by merely injecting false data into the incorruptible trusted third party[L17, L18].

If you're already familiar with zero-knowledge proofs, this paradigm will seem more than just vaguely familiar. Details follow.

The Ideal World

In the ideal world, a trusted third party computes the function \(y = f(x_1, x_2, \dots, x_n)\) on its own, announces the correct result \(y\), and can't be corrupted by an adversary. The ideal parties are connected to the trusted third party via ideal secure links, which are links that guarantee privacy and integrity in a strong information-theoretic sense. They merely send their inputs \(x_i\) to the third party, and then wait for the reply \(y\).

This ideal world setup is very easy to understand and reason about. All the complexity of SFE protocols is hidden away within the trusted third party which can compute \(f\) all by its own "in one step".

The Ideal World Adversary (simulator)

Any ideal adversary, which we traditionally call simulator, attacking the protocol is an additional external party which can take control of the regular parties, but not of the trusted third party. Control of a party means, that the adversary may

read the transcript of that party's communication with the trusted third party, but without otherwise interfering: the party becomes semi-honest

additionally make that party behave in a Byzantine way, i.e. in a totally different way from what the protocol prescribes. In particular, the adversary may change \(x_i\) to \(x'_i\) which it will then inject into the trusted third party. The party under the control of the adversary becomes malicious.

speak on behalf of the parties it controls: this means that unlike honest parties, corrupt parties won't send any inputs directly to their recipient (the trusted third party). Instead, they defer to the adversary to assume this role. The adversary has a right of veto, so to speak, to block a controlled party from sending data to the trusted third party. This is useful to model fail stop failures where the adversary may otherwise be semi-honest, but but is knocked down by the adversary.

last but not least, the trusted third party will first send its computed output \(y\) to the adversary, which will then decide whether the trusted third party is allowed to broadcast \(y\) to all parties (it sends a \(\operatorname{proceed}\) message to the third party), or not to send \(y\) to anybody else (it sends a \(\operatorname{stop}\) message to the third party). Note that the adversary isn't sending \(y\) to the parties, it is the trusted third party which does so (if allowed to proceed by the adversary). In other words: the adversary learns \(y\) and may or may not prevent its delivery to the parties. It does not, however, have the power to change \(y\) to \(y'\). This strange adversary veto setting is necessary to model failures to the fairness security requirement (see above).

The whole point of the ideal world is that we assume that the SFE protocol itself runs securely within the trusted third party, and that an adversary may only attack that protocol at the periphery, i.e. from the outside by corrupting the inputs to the protocol, by preventing some parties from submitting inputs to the third party altogether, or by delaying or preventing the trusted third party from announcing the result to the parties. It turns out that reasoning about this world is much easier than reasoning about the real world, where things can get a lot more hairy.

Another way to interpret the ideal world adversary is this: attacks in the ideal world are independent of the SFE protocol and its implementation. They are, in some sense, inevitable. No matter how "secure" our concrete SFE protocol will turn out to be, the least a real-world adversary could achieve is what the ideal world adversary can. It could achieve more than that though, by exploiting weaknesses in the SFE protocol itself. Noting that there is a difference between what both adversaries could achieve will lead the way to a general and very natural definition of SFE security below.

The Real World

In the real world, there is no trusted third party anymore which could compute \(y\) for us in a secure, incorruptible way. In other words: now, the participating parties will run the SFE protocol among themselves, and will have to deal with a much more powerful adversary than the rather benign simulator in the real world. But we'll return to this adversary in a moment.

The parties are now connected via a fully connected mesh of connections (each party is connected to each party via a bidirectional link).

Each link is protected by encrypting the content and by adding integrity data: just think e.g. TLS 1.3. These links are based upon cryptographic assumptions like hardness of factoring, or hardness of the DLP problem, hardness of inverting the encryption algorithm without a decrypting key, or the oracle model of the hash functions: therefore, they are only secure in the weaker computational security sense.

In some scenarios, we also assume the presence of a broadcast channel connected to all parties.

The Real-World Adversary

So what can a real world adversary do? Just as a simulator in the ideal world, the attacker can read transcripts of semi-honest parties, speak on behalf of the parties it controls, and it additionally can modify the behavior of corrupted parties in byzantine ways, i.e. making them depart from the protocol. For example, the attacker could turn the parties into gossipy ones, which send data to other parties, contrary to what the protocol prescribes (note that this kind of attack was not possible in the real world, because of the star topology). It could also have the party send wrong data to (a subset of) the other parties, or simply behave in a totally unpredictable way. This is how the real-world adversary is able to break the SFE protocol: it has a additional powers to harm and interfere with the protocol that the simulator doesn't have in the ideal world.

As in the ideal world, it is easy to imagine the real-world adversary as being some external entity different from all parties. The adversary controls some parties by sending them instructions over yet another secure channel. Here too, corrupted parties unter control of the adversary won't output anything themselves. Instead, they send their outputs to the adversary, which in turn will do with them as it pleases (relaying them to their final destination, or intercepting and stopping them short, or duplicating them as in replay attacks...).

In practice, the adversary doesn't have to be an external command and control (C&C) server controlling the corrupted parties via encrypted links. Usually, the adversary will be distributed among the corrupted parties, i.e. it will be sitting right inside the corrupted parties' programs and run locally, possibly coordinating themselves via the already established regular TLS 1.3 channels between the parties. The notion of an external adversary controlling some parties is easier to reason with, especially when we'll come to modelling attacks on fairness and guaranteed delivery of output, so we'll stick to that, as it is also standard for security proofs in the literature.

The adversary's goal it to learn as much as possible by inducing the parties it controls to disclose their transcripts, and to behave in any possible way trying to break the SFE protocol, depending on the adversarial models.

Defining and Proving the Security of an SFE Protocol in the Ideal- vs. Real-World Paradigm

Now, we're ready to define what it means for an SFE protocol to be secure in the ideal- vs. real-world paradigm in the general sense, without using heuristics like the security requirements we've learned above (privacy, correctness, independence of outputs, fairness, guaranteed delivery of output).

In the semi-honest setting, defining and proving security of an MPC protocol is much easier from doing so in the malicious setting, which is to be expected. For lack of time, we won't write down the formal definitions nor the simulation proof techniques. Interested readers are encouraged to work through Yehuda Lindell's excellent "How to Simulate it?" tutorial[L17, L18].

(... to do: expand this section)

Feasability Results

So what kind of security can be achieved with MPC at all? One of the main achievements of cryptology research in the MPC arena was the following set of results[CH17]. Here, \(n\) is the number of parties involved in the MPC protocol (not including the adversary), and \(t \le n\) is the number of parties under the adversary's control. We repeat here the feasibility theorems stated in the general introduction to MPC:

Semi-honest setting

for \(t \lt n/2\), every functionality can be securely computed with perfect security [BGW88, CCD88]

for \(t \lt n/3\), every functionality can be securely computed with perfect security [BGW88, CCD88]

for \(t \lt n/2\), every functionality can be securely computed with statistical security [RB89]

for \(t \lt n\), every functionality can be securely computed with abort with computational security [GMW87]

Odds and Ends

Functionalities

(... to be written)

Information-theoretic vs. Computational Security

(... to be written)

Source

This complete crypto bite is my transcript of Prof. Yehuda Lindell's presentation "Definitions and Oblivious Transfer" at the 5th BIU Winter School on Cryptography, Advances in Practical Multiparty Computation, Feb. 15-19, 2015, Bar-Ilan University, Israel (all videos). We're covering the part between 09:10 and 1:01:58 (the remainder is about Oblivious Transfer, which I cover separately in another crypto bite). The embedded video starts with the actual definitions at 09:10, but feel free to rewind to the very beginning for the general Winter School intro.