This is one of the most cited papers in all of computer science. The authors are pictured above around the time their RSA paper was published (Shamir, Rivest, Adleman - left to right). I learned about the mathematics of RSA in 1985 as part of a cryptography course based on Dorothy Denning's excellent introductory text Cryptography and Data Security. Over the years quite a few security professionals have confided in me that they do not really understand the basic mathematics of how RSA works. In this post I hope that I can tell the RSA story one more time, in a way that makes it easier to understand.

Background

Let E be a generic encryption operation, and let D be its corresponding decryption operation. For any message M, a basic property of any cryptosystem is that

M = D(E(M))

or in words, that decrypting an encrypted message yields the original message. We can then think of D(E( )) as a self-mapping, taking messages M to themselves under the action of some key. To explain RSA we are going to look at several well-known self-mapping functions from number theory, and then show how we can decompose these self-maps into two parts, which can be thought of as encryption and decryption.

All references below to numbers will mean non-negative integers: 0, 1, 2, 3 and so on. Number theory is mainly concerned with these numbers and their properties. Also, we will use the terms number and message interchangeably, since any message has a binary representation that can be interpreted as a number (see the standard PKCS #1 on a specific method for doing this).

Modular Exponentiation

Our source of self-mappings will be derived from modular exponentiation, the fundamental operation in many public key systems including RSA, DSA, Diffie-Hellman, and El Gamal systems.

We start with the simpler notion of exponentiation. Exponentiation is the operation of raising a base M to an exponent e, written as M^e, which simply means to multiply M by itself e times. When e = 2 exponentiation is called squaring, and called cubing when e = 3. For larger values of e we also describe M^e as "raising M to the e-th power".

There are two rules which can be used to simplify exponentiations, both of which are used in RSA: the product rule

M^(e*d) = (M^e)^d

and the sum rule

M^(e + d) = (M^e) * (M^d).

In modular exponentiation we are given an additional parameter m and we need to compute M^e mod m. For any value x , x mod m means to return the remainder of dividing x by m. So 17 mod 5 = 2 since 17 = 3*5 + 2, or in words, 5 goes into 17 three times with 2 left over (the remainder). Here m is called the modulus. So M^e mod m means to first compute M^e and then return the remainder upon dividing by m.

Flawed RSA v1

After that introduction, we need one simple result to take us forward, called Fermat's Little Theorem: let p be a prime and let M be any value in the range 0 < M < p, then

M^(p - 1) = 1 mod p

Since M^p = M^(p-1) * M by the exponentiation sum rule, and M^(p-1) = 1 mod p it then follows that

M^p = M mod p.

This is good news since we have now found a self-mapping function. Can we somehow decompose this self-map into two parts, one that we can call encryption and the other decryption?

Well imagine that Alice can find two values e and d such that e*d = p, and let Alice publish the pair (e, p) as her public exponent and her public modulus, respectively. If Bob wants to send her a message M she instructs him to encrypt it using the following modular exponentiation

C = M^e mod p

and Bob then sends C to Alice. When Alice receives C she recovers M by the following modular exponentiation

M = C^d mod p.

But why does this operation return M to Alice? Well its because

C^d = (M^e)^d = M^(e*d) = M^p = M.

Here we have used the exponentiation product rule: the exponentiation applied by Bob is itself exponentiated by Alice to return the original message.

But we have a snag. We said that Alice found e and d such that e*d = p, but since p is prime then the only possible solutions for e and d are 1 and p. But if e = 1 or e = p then the encryption operation M^e mod p applied by Bob has no effect on M since

M^1 = M^p = M mod p.

Encryption would be doing nothing.

But here is the key observation.

The property that permits Alice to recover M is not e*d = p but rather that

e*d = (p - 1) + 1 = 1*(p-1) + 1

and Alice can still recover M if e*d is equal to 2*(p-1) + 1, or 3*(p-1) + 1, or k*(p-1) + 1 for any k > 0. Why? Well using the product rule we have that

M^(k*(p-1)) = M^((p-1)*k) = (M^(p-1))^k = 1^k = 1 mod p,

and then using the sum rule

M^(k*(p-1) + 1) = M^(k*(p-1)) * M = M.

So we are back in business if we can find e and d such that e*d = k*(p-1) + 1, for some k > 0. But using modular arithmetic, this is the same as saying that

e*d = 1 mod (p - 1).

As it turns out there is something called the Euclidean algorithm which is a method for efficiently solving for x in modular equations of the form a*x = b mod m. So we can just pick a random value for e in the range 1 < e < p and then use the Euclidean algorithm to find the corresponding value of d for which e*d = 1 mod (p - 1).

Let's see an example of this. Let p = 61 and then select e as e = 17. Then we have that d = 53 since 17 * 53 = 1 mod 60, or more specifically, 17 * 53 = 15 * 60 + 1. So encryption would then be defined as C = M^17 mod 61 and decryption defined as M = C^53 mod 61. Here our messages would need to be less than 6 bits in length.

But now we have another snag - quite a serious one this time. When Alice publishes her public key (e, p), what distinguishes her from other people is her knowledge of her decryption key d. But since e*d = 1 mod (p - 1) then another person Carol can easily determine d by simply using the Euclidean algorithm to solve e*d = 1 mod (p - 1) since e and p - 1 can be directly derived from the public (e, p).

To fix this snag we need to change our tactics and move from a public modulus that is a prime p to a value n = p*q that is the product of two primes p and q.

Correct RSA v2

The importance of Fermat's Little Theorem in the last section is that it gave us a self-mapping modulo a prime p. There is a generalisation of this result for composite numbers known as Euler's Theorem: For any n let a be any value in the range 0 < a < n, that shares no factors with n then

a^phi(n) = 1 mod n

where phi(n) is the number of values in the range 1 to n that share no factors with n. Note here that for a prime p that phi(p) = p - 1 so that Euler's criterion includes the result of Fermat's Little Theorem.

If we let n = p*q for two primes p and q then it is easy to show that phi(n) = (p-1)*(q-1). Now we can repeat the construction in the previous section by replacing computations with p - 1 (derived from Fermat's Little Theorem) with computations based on phi(n). To this end we select a random e in the range 1 < e < phi(n) and use the Euclidean algorithm to solve for d such that

e*d = 1 mod phi(n) or e*d = 1 mod (p-1)(q-1).

Alice's public key becomes (e, n) and Bob can encrypt a message M for her as M^e mod p. Alice decrypts the message as M = C^d mod p which works because

So the public key of Alice is then (17, 3233) and anyone can encrypt a message for Alice by computing C = M^17 mod 3233. Alice recovers the message by computing M = C^2753 mod 3233.

By now you should be able to follow this T-shirt description of RSA.

Let's think about Carol again. Can she easily determine Alice's private key d from her public parameters (e, n)? Well if she could determine phi(n) from n then she could use the Euclidean algorithm to solve for d in the equation e*d = 1 mod phi(n). But as was shown in the original RSA paper, if Carol could derive phi(n) from n then this would be equivalent to Carol having the power to factor n, which is known to be a computationally hard problem.

We have a left out many technicalities in this description of RSA, and to implement RSA in practice you need to solve many other issues, such as how to find large primes. For RSA to be secure it must be the case that n is sufficiently large to defeat the best known factoring algorithms. Using current estimates p and q should be at least 500 bits in length, and often over 1000 bits for additional security. The construction for RSA as outlined above is totally general - you can pick the primes as small or large as you like - the math still all goes through. However, larger keys takes longer times to process.

Wednesday, June 18, 2008

Pictured left is Dan Egerstad, a 21-year old Swedish computer security consultant (at times also called a hacker, researcher and other colourful titles) who approaching a year ago now ran afoul of various authorities for his involvement in the exposure of account details harvested from the Internet-based anonymity service Tor. He obtained password information for 1,000 e-mail accounts belonging to foreign embassies, corporations and human rights organizations, and posted some details to his web site. For his trouble he was taken to the local police headquarters for questioning, and various computing equipment and written material were seized from his domicile. Egerstad claims he only wanted to bring attention to the inherent weaknesses in Tor that he expects are not well-known to its users. Another case of irresponsible disclosure or over-zealous awareness? Before examining the details of the Egerstad case we need some background on anonymity (the research material used to write this article was collected in Freemind and is available as a flash web application here).

When we speak of anonymity here we mean anonymity with respect to a digital action, such as sending an email, paying a bill or browsing a site. Such actions are considered to be anonymous if they can be attributed to nobody or attributed to everybody. In practice it is very difficult to ensure that digital actions are attributed to nobody since Internet protocols are very generous in the disclosing information as part of their operation. On the other hand, building systems that attribute actions to everyone is also impractical. Deployed internet anonymity solutions take the approach of making digital actions attributable to a sufficiently large set of users. So while it would be nice to have an anonymous email system whose messages could be attributed to anyone on the planet (once they all have email), the sender may feel that their anonymity is sufficiently protected if the system renders their email potentially attributable to a million people who were online when the email was sent. Anonymity loves a crowd, as the saying goes.

We will now give a short overview of how two well-known anonymity services work, and then get back to the transgressions of Mr. Egerstad.

Dealer-Based Systems

In dealer-based systems, we imagine a large number of people sitting around a big table, with one person, Dixie, distinguished as the dealer. Each person at the table can send or receive a message from someone else at the table. When Alice wants to send a message to Bob, she puts the message in an envelope, writes Bob's name on the front, and passes the addressed envelope face down to Dixie. Dixie accepts the envelope from Alice, and continues collecting envelopes from people until she collects a pre-determined number of messages, let's say 100. Dixie then shuffles the 100 envelopes until their order is random as compared to the order in which they were received, and she then deals the envelopes out as they are addressed.

Someone at the table watching all this, say Eve, learns that Alice gave a message to Dixie, and that one of the people who was dealt an envelope by Dixie was her intended recipient (which must include Bob). So the anonymity of Alice's receiver might be said to be 1-in-100 assuming that the 100 messages collected by Dixie were delivered to 100 different people. If Bob is a popular fellow then he may have 10 messages dealt to him by Dixie, and the anonymity of Bob as Alice's receiver is reduced (there are fewer possible receivers to consider).

Here the value of 100 is called the threshold, and reaching the threshold number of messages triggers Dixie to shuffle and deliver. Alice may search for another table where the dealer Debbie has a threshold of 1000 say, because she wants to hide her recipient amongst a larger set of people. Alice will wait longer for her messages to be delivered since it will take Debbie more time to collect her threshold of 1000 envelopes as compared to Dixie's 100. There is tradeoff here between anonymity and latency: that is, how many other messages your message is shuffled with (the size of the threshold) and the delay to deliver of your messages (waiting for the dealer to gather the threshold).

The dealer can pull other tricks. A dealer Dylan may have a threshold of say 500 messages but on the first collection he actually accepts 600 envelopes. The 600 envelopes are shuffled, but he puts aside 100 (called the pool) and then delivers the other 500. The next 500 received messages are shuffled with the current pool, 100 set aside for the next pool, and the remaining 500 messages delivered. The pool breaks the direct correlation between when Alice submits her message and when it is dealt by Dylan. Eve will need to consider more potential receivers to determine who Alice might be communicating with.

What we have described here as dealer-based systems gives the simple intuition behind MIX anonymity systems, originally proposed by David Chaum, a prolific and influential cryptographic researcher. Chaum's seminal ideas have been greatly elaborated on in the last 20 years or so, and for a practical internet service providing anonymity based on the MIX principle, please see JonDonym.

Lotto-Based Systems

Theother class of anonymity systems we will consider shall be referred to as lotto-based systems (these are the ones that got Mr. Egerstad in hot water). We are proceeding by analogy as before with dealer-based systems. In the game of lotto players are attempting to predict which balls will be selected by a mechanical process that seems very random. A collection of numbered balls (say 40) are released into a clear and hard plastic sphere or cylinder, which when rotated, causes the balls to bounce around in an unpredictable manner. After a "sufficiently large" number of revolutions, an arm reaches into the sphere and catches one of the balls as it is bouncing around. The selected ball becomes the first number. The arm then reaches in again, catches another ball, and so on, until the required number of balls to complete the game are selected (say 6). All the time the sphere is rotating, and the remaining balls bounce off each other and the hard surface of the sphere.

Imagine now that instead of adding 40 balls at the beginning to the sphere, we keep on adding new balls, for example at the rate of a few per second. Also imagine that the arm does not just pull out 6 balls but rather it continuously extracts one ball at a time. We might think of this setup as the perpetual lotto system with balls continuously entering and leaving the system. If we put a ball into the sphere, how long would it take to come out? If we took the numbers off the balls (let them all just be white balls), when would we know that a given ball has come out?

These principles underpin several anonymity systems, such as Crowds. We replace the balls with messages and the rotating sphere becomes a collection of servers that randomly route (bounce) the messages amongst themselves before delivering the message to the intended recipient. In dealer-based systems the anonymity of Alice's receiver was tied to the size of the threshold used by the dealer. In lotto-based systems Alice's message is potentially mixed with all messages in the system at the same time, which may be a large depending on the traffic rate into the system.

Tor - The onion router

Tor is a particular, and seemingly popular, implementation of an anonymity network that falls into our simple lotto-based classification. Tor is short for "The onion router", and is considered to be a second generation version of onion routing for anonymous communication (onion routing 2.0 in common parlance). Here an onion refers to a message that is encrypted multiple times, or in layers. When Alice wants to send a message she first chooses a path through the Tor network based on a list of available servers. So instead of her message bouncing around randomly in the system (as in Crowds), she actually chooses the route before submitting the message. To protect her selected path she encrypts the message with the public key of the last server in the path, then encrypts that encrypted message with the key of the second last server in the path, and so on, for each server in her chosen path. The following diagram shows the process (see this wikipedia article for better resolution and more explanation).

Once all the layered encryption is completed, Alice sends her encrypted message (now called an onion) to the first server she selected in the path. The server decrypts one layer of the onion to reveal yet another onion and the next server in the path. The first server sends the onion to the second server in the path who decrypts the next layer and forwards the remaining onion to the next named server in the path, and so on. The final server in the path decrypts the onion one final time to reveal the original message and the intended recipient. The final server delivers the message to Alice's intended recipient.

Back to Dan

In Tor there are 3 roles for servers nodes: entry nodes that accept messages into the network, routing nodes that decrypt one onion layer and then forward the remaining onion, and finally exit nodes that deliver the message out of the network to the intended recipient. Any physical server in the Tor network can act in any of these node roles. What is revealed about Alice? Well the entry node sees her IP address, while the routing nodes learn the previous and next nodes in the path. However the exit nodes learns the message itself and the address of the recipient. If Alice wants this last hop to be secure then she must explicitly use a protocol like SSL to protect her message. But apparently this is a little known fact as Dan Egerstad demonstrated. This caveat is (now) clearly indicated on the Tor project site, and a general warning: Be smart and learn more. Understand what Tor does and does not offer. This list of pitfalls isn't complete, and we need your help identifying and documenting all the issues.

Servers can be donated to the Tor network to act as nodes, and currently there are about 1500 nodes. So Dan Egerstad created/donated 5 exit nodes and sniffed the traffic contents of these nodes as they forwarded traffic out of the Tor network. He was not targeting any specific users of Tor, only looking at the traffic that was routed to his exit node as part of the overall Tor protocol.

Surprisingly Egerstad obtained login and password information for over 1,000 e-mail accounts belonging to foreign embassies, corporations and human rights organizations. More specifically he harvested information on users from embassies belonging to Australia, Japan, Iran, India and Russia, and also accounts information belonging to the foreign ministry of Iran, the United Kingdom's visa office in Nepal and the Defence Research and Development Organization in India's Ministry of Defence. Egerstad ensured controversy (and more than that actually) when he published the login and password details for 100 select email accounts, using the justification that he felt it would be the most effective way to make the account owners aware that their communication had been compromised. Mr. Egerstad has stated that there is no security flaw with Tor - the real threat comes from user expectations that their message contents are being protected end-to-end by Tor, when in fact encryption is only applied to internal Tor network communication.

Lessons Learned

Mr. Egerstad has highlighted that Tor obfuscates the path that a message follows through the Tor network (protecting the sender), but confidentiality is not provided at the exit point to the network. Of course the Tor people have documented this property, but there is nothing like an incident to drive the message home. Users should also understand that the role of the exit nodes is still based on the honour system, for both delivery and respecting the privacy of message content.

The main threat addressed by the design of Tor is protecting the identity of Alice from eavesdroppers - that is, it should be difficult to correlate the identity of a sender through the routing of her messages in the Tor network. So if an attacker is targeting Alice they may have to compromise a large number of nodes (or introduce a large number of their own nodes) into the Tor network to trace a path back to Alice. This sounds like hard work. Mr. Egerstad has shown how to harvest account details by merely attaching a new exit node and eavesdropping on the Tor business-as-usual traffic that is routed through the node by user path choices. So while passive eavesdropping may not compromise a specific person like Alice, it can still reveal valuable security information for the attacker.

Providing open internet-based anonymity services is technically difficult, and the current set of services being offered approximate true anonymity. True anonymity, or anything approaching it, can only be provided in closed systems where all participants work synchronously, sending messages of a predetermined size at predetermined times. It would take a long time to justify this statement. I refer the interested reader to the excellent set of lectures on anonymity primitives and communication by George Danezis. Research in providing better anonymity systems is still an active area.

The research material used to write this article was collected in Freemind and is available as a flash web application here.

Monday, June 2, 2008

In 2003 the Computer Research Association sponsored a workshop on the Grand Research Challenges in Information Security & Assurance. The by-invitation-only event brought together 50 scientists, educators, business people, futurists, and others who have some vision and understanding of the big challenges (and accompanying advances) that should shape the research agenda in this field over the next few decades. The final report listed 4 main challenges worthy of sustained resourcing and effort:

Give end-users security controls they can understand and privacy they can control for the dynamic, pervasive computing environments of the future

Develop quantitative information-systems risk management to be at least as good as quantitative financial risk management within the next decade.

In the 4th challenge security (risk) professionals are being asked to follow the yellow brick road to the emerald city of quantitative financial risk management (QFRM) and the wizards therein. A recent article from a May issue of the Economist examines the state of QFRM in light of the subprime debacle, highlighting the $30 billion write down of UBS (Used to Be Smart) as the (sub)prime example of flawed risk management. The outlook in the emerald city is professionally gloomy.

One of the main quantitative culprits identified by the Economist is Value-at-Risk, usually written as the contraction VaR (presumably to distinguish it from VAR and Var, long standing contractions for the Variance). VaR is the centrepiece in many QFRM toolboxes, being honoured with inclusion in the Basel II guidelines for calculating reserve capital. But the subprime debacle has highlighted one the well-known weaknesses of VaR - that it is not strong at predicting the low-probability/high-impact events that are attendant to catastrophe.

VaR is essentially a simple concept supported by arbitrarily complex modelling (see this paper from the upcoming WEIS 2008 conference for VaR applied to information security). Let A be an asset for which we may realise a loss over a defined time period T. Given a level of significance a, the VaR of A is the maximum loss L that will occur over the time period T with probability 1 - a. So if A is a stock of interest over the next T = 100 days, and we fix a to be 0.01, then the VaR of A is the maximum loss L that will occur over the next 100 days with probability 0.99 (or 99% of the time). The interpretation here is that the losses from A will be at most L on 99 out of 100 days of trading.

But what about that other one day of trading not covered? The magnitude of the loss on that rogue day is outside the scope of the VaR model, and need not be proportional to the loss bound predicted by VaR. In fact, VaR is designed to answer questions of the form "Within reason, how bad can things get?", which seem very sensible until we acknowledge that the subprime debacle was not "within reason". As the Economist observes, after a few years of bootstrapping and estimation, VaR models are transferred onto historical data and settle down to predicting a stable future from a stable past, leading to the conceit that risks can be quantified and regulated.

One pyrrhic benefit from the subprime debacle is that VaR models can now be recalibrated with catastrophic data sets, and should therefore produce better predictions. The Economist notes a new interest in non-statistical models based on enumerating risk scenarios that describe what could go wrong, and then thinking through the consequences of the scenario crystallizing. Scenario generation is typically a people-intensive activity, facilitated through brainstorming and workshops - not the forte of quants. Nonetheless scenario-driven risk analysis (SDRA) has the ability to uncover root causes and dependencies that may be absent or insufficiently weighted in quantitative models. On the other hand, the SDRA may fail to generate an exhaustive set of material scenarios, and more mundanely, poorly facilitated sessions can lead to futile bickering over rating the significance of scenarios.

Regardless of the model used, the Economist notes that risk management is becoming less tractable due to complexities and dependencies. Partitioning risks into those related to credit, markets and liquidity is no longer sufficient since the risk inherent in some of the subprime financial products did not respect these organisational boundaries. In short, we have entanglement. Further, obtaining aggregate risk positions is becoming more difficult since some departments still maintain their risk exposure on desktop Excel models, and over-the-counter dealings that are not formally traded also contribute to uncertain aggregate positions. For shareholders, and even regulators, it is very difficult to unwind and assess the risk exposure of a company.

What conclusions might we have for IT (Security) risk management? Rich Bejtlich has commented on the Economist article, and made direct comparisons to the difficulties of risk in financial environments to those in IT environments. The good news is that we in IT Risk should no longer feel compelled to wed our futures to the QFRM yellow brick road, and perhaps we are better served by SDRA. We can also stop beating ourselves up on the point that the weakness of IT Risk is the absence of data - the real weakness is poor modelling, and the decisions based on the output of such models. The Computer Research Association grand challenges of 2003 may be just too grand, and in fact unnecessary.