tag:tonyarcieri.com,2014:/feedTony Arcieri2015-03-17T08:00:12-07:00Tony Arcierihttp://tonyarcieri.comSvbtle.comtag:tonyarcieri.com,2014:Post/would-rust-have-prevented-heartbleed-another-look2015-03-17T08:00:12-07:002015-03-17T08:00:12-07:00Would Rust have prevented Heartbleed? Another look<p>In case you haven’t heard, <a href="http://www.mail-archive.com/openssl-announce@openssl.org/msg00169.html" rel="nofollow">another serious OpenSSL vulnerability will be announced this Thursday</a>. It reminded me of about a year ago, when Heartbleed was announced:</p>
<p><a href="https://twitter.com/rust_hulk/status/453294938763976704" rel="nofollow"><img src="https://cleancrypt.org/img/memorysafety.png" alt="RUST HULK SAYS YES"></a></p>
<p><a href="https://air.mozilla.org/bay-area-rust-meetup-december-2014/" rel="nofollow">In December 2014 I gave a talk at Mozilla about cryptography in Rust</a> <a href="https://speakerdeck.com/tarcieri/thoughts-on-rust-cryptography" rel="nofollow">(slides here)</a>. I have been meaning to do a followup blog post both about my talk, reactions I received from it, and my subsequent thoughts…</p>
<p><a href="http://www.tedunangst.com/flak/post/heartbleed-in-rust" rel="nofollow">And then this blog post happens</a>. I have been reading Ted Unangst’s blog for quite awhile, mostly with great respect. This particular blog post was, unfortunately, not up to his usual standards. He blogs on a wide range of topics, but security is a complicated field and this blog post is, in my opinion, highly misleading. Ted claims he implemented “Heartbleed” in Rust. Is that actually the case?</p>
<p>In my talk at Mozilla, I covered several of the SSL/TLS bugs seen in 2014, and spent a lot of time covering “goto fail” (SecureTransport) and “goto cleanup” (GNUTLS). I spent some 15 minutes discussing these vulnerabilities (and how Rust could’ve helped), and probably about 15 seconds talking about Heartbleed, because I thought the severity of Heartbleed was obvious enough that the case for Rust’s memory safety should likewise be as obvious. However, apparently I assumed too much. So let’s dig into Heartbleed and Ted’s alleged version of it in Rust and see what’s really going on.</p>
<p>Let’s talk about Tedbleed! What’s going on? Is it as bad as Heartbleed, and are Rust’s memory safety features being oversold by uninformed zealots? Let’s take a look!</p>
<h2>Tedbleed</h2>
<p>Here is Ted’s source code:</p>
<pre><code class="rust">use std::old_io::File;
fn pingback(path : Path, outpath : Path, buffer : &amp;mut[u8]) {
let mut fd = File::open(&amp;path);
match fd.read(buffer) {
Err(what) =&gt; panic!("say {}", what),
Ok(x) =&gt; if x &lt; 1 { return; }
}
let len = buffer[0] as usize;
let mut outfd = File::create(&amp;outpath);
match outfd.write_all(&amp;buffer[0 .. len]) {
Err(what) =&gt; panic!("say {}", what),
Ok(_) =&gt; ()
}
}
fn main() {
let buffer = &amp;mut[0u8; 256];
pingback(Path::new("yourping"), Path::new("yourecho"), buffer);
pingback(Path::new("myping"), Path::new("myecho"), buffer);
}
</code></pre>
<p>Let’s locate the problematic part of this code:</p>
<pre><code class="rust"> let buffer = &amp;mut[0u8; 256];
</code></pre>
<p>Uhoh, this code contains a mutable buffer that is being reused and mixing up data. So what exactly is the severity?</p>
<ul>
<li>
<strong>Problem</strong>: Reused mutable buffer</li>
<li>
<strong>Severity</strong>: Plaintext recovery</li>
<li>
<strong>Worst case scenario</strong>: An attacker can recover arbitrary plaintexts from encrypted traffic</li>
</ul>
<p>Ouch! This is a bad bug that the Rust compiler failed to prevent. First we might ask if this is the sort of code that a Rust programmer would actually write in practice. My answer?</p>
<p><em>Absolutely</em>.</p>
<p>To Ted’s credit, this code isn’t strawman code, or at least, while this specific rendition of it might be, Rust programmers definitely want to avoid allocations by reusing mutable buffers for performance, particularly in these sorts of I/O buffering scenarios.</p>
<p>And it’s not just Rust. A very similar vulnerability just happened in Java fairly recently called <a href="http://blog.gdssecurity.com/labs/2015/2/25/jetleak-vulnerability-remote-leakage-of-shared-buffers-in-je.html" rel="nofollow">JetLeak</a> which had a similar threat of recovery of other connections’ plaintexts because of improper handling of mutable buffers.</p>
<p>But is it Heartbleed?</p>
<h1>Heartbleed</h1>
<p>What was Heartbleed?</p>
<ul>
<li>
<strong>Problem</strong>: Improper pointer arithmetic resulting in out-of-bounds memory reads</li>
<li>
<strong>Severity</strong>: Memory exposure and private key recovery</li>
<li>
<strong>Worst case scenario</strong>: An attacker can perform out-of-bounds reads of values from a process’s memory. Sophisticated attacks allowed for the recovery of SSL/TLS private keys or other sensitive data in-memory.</li>
</ul>
<p>This is a lot worse than “Tedbleed”. An analogy might be the telephone network when the phreaks first started exploiting it. The “Cap'n Crunch” whistle worked by exploiting something known as in-band signaling. That is to say: the phone network provides a communication medium, but it also needs control signals. Where “Tedbleed” might let us snoop on someone’s phone calls, Heartbleed lets us take over the phone network and impersonate the phone company, because we have access to more than just the signal, we have the keys to the kingdom.</p>
<p>Heartbleed is a vulnerability rooted in the fact that C is not a memory safe language.</p>
<p>Rust is. Unless you venture into the (explicitly demarcated) unsafe portion of Rust, you will not see memory exposure vulnerabilities like Heartbleed which are due to improper bounds checking. You will likewise not see the much more severe “<a href="http://www.securitysift.com/exploiting-ms14-066-cve-2014-6321-aka-winshock/" rel="nofollow">Winshock</a>”-style remote code execution vulnerabilities.</p>
<p>Memory safety is paramount to writing secure programs.</p>
<p>We can’t get the keys with Tedbleed.</p>
<p>We can with Heartbleed.</p>
<p>Tedbleed is an entirely different class of vulnerability from Heartbleed. Where Tedbleed exposes the contents of a particular, bounded buffer to an attacker, Heartbeed exposed the memory of an entire process. It doesn’t matter what value it was, including SSL/TLS private keys, Heartbleed could be used to write it onto the wire.</p>
<h1>Conclusion</h1>
<p>Rust is a memory safe language.</p>
<p>C is not a memory safe language.</p>
<p>Writing programs in Rust prevents a wide range of attacks that result from commonplace errors made in C programs. These errors are made by novices and experts alike. When you read security announcements, these sorts of errors are often described as being corrected with “improved bounds checking”, a.k.a. fixing arithmetic, and unfortunately this class of error is exceedingly common, and often results in remote code execution vulnerabilities.</p>
<p>Ted is wrong: Rust would’ve prevented Heartbleed. Ted went out of his way to make a strawman version of Heartbleed, and created a vulnerability which does not allow out-of-bounds memory reads, but instead looks a lot more like JetLeak.</p>
<p>I hope it’s clear to anyone who actually cares about security that a memory exposure and key disclosure vulnerability is more severe than a plaintext recovery vulnerability, and that memory safety confers a wide range of security benefits on programs.</p>
<p>Rust would’ve prevented Heartbleed, but Heartbleed is actually kind of boring compared to remote code execution vulnerabilities like <a href="http://www.securitysift.com/exploiting-ms14-066-cve-2014-6321-aka-winshock/" rel="nofollow">Winshock</a> or <a href="http://www.phreedom.org/research/exploits/apache-openssl/" rel="nofollow">openssl-too-open</a>. Remote code execution vulnerabilities are far scarier, and largely preventable in Rust due to its memory safety.</p>
<p>I’m also quite curious if the new OpenSSL vulnerability will involve memory corruption…</p>
tag:tonyarcieri.com,2014:Post/volapuk-a-cautionary-tale-for-any-language-community2015-01-20T09:32:00-08:002015-01-20T09:32:00-08:00Volapük: A Cautionary Tale for Any Language Community<p>You may have heard of artificially constructed spoken languages such as Esperanto, Interlingua, or Lojban, but do you realize that before any of these languages there was another constructed language which once claimed nearly a million followers, making it the most popular constructed language of all time?</p>
<p>That language was <a href="http://en.wikipedia.org/wiki/Volap%C3%BCk" rel="nofollow">Volapük</a>, and despite achieving such a high number of speakers is nowadays all but forgotten, even among linguists. I think Volapük’s story is one worth telling, because the reasons for its downfall are worth knowing for any member of a language community.</p>
<p>Volapük was created by Johann Martin Schleyer between 1879 and 1880. Schleyer was a Roman Catholic Priest who, one night, claimed to hear God himself speak to him commanding him to create a universal language to unite the European peoples. Volapük was a hybrid of English, German, and French, although you probably couldn’t spot the English influence through the umlauts.</p>
<p>Enter Auguste Kerckhoffs, perhaps best known for <a href="http://en.wikipedia.org/wiki/Kerckhoffs%27_principle" rel="nofollow">his eponymous principle from cryptography</a> which we now remember with the aphorism “security by obscurity” (or rather, that security by obscurity doesn’t work). Kerckhoffs was an early Volapük enthusiast and performed a rather essential chore for popularizing any constructed language: translating learning materials about the language to other languages so speakers of those respective languages could teach themselves Volapük. And thus the seeds for a common European language were sown.</p>
<p>Volapük was a hit! Volapük clubs started popping up throughout Europe. Large conventions were held first in Friedrichshafen in 1884, then Munich in 1887, and finally Paris in 1889. The first two conventions were held in German, but by the third conference, everyone was speaking in Volapük, even the waiters!</p>
<p>Kerckhoffs, who was an early friend and popularizer of the language, would subsequently sow the seeds for its destruction. Kerckhoffs was unhappy with some parts of the language and thought they could be improved. He came to hold the rank of Director of the Academy of Volapük, and felt he had rightfully earned enough influence to shape the future direction of the language. He proposed a number of reforms to Volapük which Schleyer rejected. This lead to a schism between the followers of Schleyer, the language’s creator, and the followers of Kerckhoffs.</p>
<p>The language fragmented, and in doing so, lost the very thing which made it unique: its universality. And thus the whole thing fell apart. People began to move on to newer, better “universal” languages like Esperanto, and the dream of a common language spoken by everyone was lost, or rather, it’s gradually becoming English by default.</p>
<p>But I think there’s a larger lesson to be learned here: any language is only as strong as its community, and there are constantly things that divide a community in two (or worse). There are several examples of this sort of thing in the history of programming language development: Perl 6, Python 2 vs Python 3, Paul Phillips forking the Scala compiler (having formerly been its primary author), and the Node.js fork IO.js.</p>
<p>None of these examples is directly analogous to the story of Volapük, but I feel there’s a more general point to be made, and it’s a sociological point instead of a technological point:</p>
<p>If you can avoid a fundamental schism in your language community, you probably should. To maintain a healthy community, everyone, particularly the people most instrumental in the development of a language, need to work together to keep the community cohesive. Contrarily, major schisms or breakdowns in the relationships and development of a language and its community are big warning signs that should make you think twice about the future of a language. You may just be learning the next Volapük.</p>
tag:tonyarcieri.com,2014:Post/cream-the-scary-ssl-attack-youve-probably-never-heard-of2014-11-11T23:53:08-08:002014-11-11T23:53:08-08:00CREAM: the scary SSL attack you've probably never heard of<p><a href="http://img.svbtle.com/g25njwnhwhi7ng.png" rel="nofollow"><img src="https://d23f6h5jpj26xu.cloudfront.net/g25njwnhwhi7ng_small.png" alt="cycles.png"></a></p>
<p>2014 was a year packed full of the discovery of new SSL<sup>†</sup> attacks. First we found Java was vulnerable to a new type of “Bleichenbacher” attack. Apple’s SecureTransport, used by both iOS and OS X, went down next with the “goto fail” vulnerability. GNUTLS was vulnerable to a man-in-the-middle attack. OpenSSL perhaps came out as the most notorious with the Heartbleed attack. The NSS library, used by Chrome and Firefox among others, was vulnerable to yet another Bleichenbacher attack known as BERserk. The Microsoft SChannel library used by Windows was vulnerable to a particularly scary remote code execution vulnerability. At least two protocol-level vulnerabilities in SSL were widely circulated: the triple-handshake attack and POODLE. And we still have over a month left in the year!</p>
<p>While 2014 is a notable outlier in terms of the sheer number of attacks discovered and the publicity they’ve received, these sort of attacks are nothing new. The 2002 “openssl-too-open” attack allowed remote code execution attacks against OpenSSL, making it worse than Heartbleed (but on par with the recent SChannel attack). However, it happened at a time when the Internet was less essential for most people’s day-to-day lives, so perhaps it’s little more than a historical footnote at this point.</p>
<p>Okay, so I’ve just named off a ton of attacks, and your eyes might be glazing over. But you might be wondering why I haven’t even mentioned CREAM yet. Like “openssl-too-open”, CREAM is an old OpenSSL attack dating back to 2005. But where the main takeaway from attacks like “openssl-too-open” are probably “C is too dangerous to use for writing libraries like OpenSSL”, CREAM is an attack that has had a profound effect on cryptography to the point that many of cryptography’s practitioners spend much of their time worrying about its ramifications. </p>
<p>If you dabble in cryptography, you may have heard of cache timing attacks, but haven’t specifically heard of CREAM. What is CREAM and why is it so bad?</p>
<p><a href="http://cr.yp.to/antiforgery/cachetiming-20050414.pdf" rel="nofollow">CREAM is a cache timing attack that was used against OpenSSL’s implementation of AES</a>. It allows an attacker on one computer to extract AES keys from another computer over a network. The attack works by measuring round trip timings of known plaintexts encrypted under AES by OpenSSL running on the victim’s computer.</p>
<p>That’s right: simply by measuring minute timing discrepancies over a network, an attacker could extract AES keys from another computer, making it almost as severe as Heartbleed. These timing discrepancies occurred because AES uses a design element known as an S-box, which is effectively a table whose elements we look up based on the AES key. Unfortunately, CPUs are extremely eager to optimize these sorts of lookups with caches, and because the lookups are ultimately based on the key, they introduce what’s known as a <em>side-channel</em>.</p>
<p>Now’s the part in the blog post where I admit the title is clickbait, but hey, <a href="http://www.clickhole.com/" rel="nofollow">I learn from the best</a>. This attack has not been previously branded as CREAM before, but if Heartbleed (or BEAST, CRIME, BREACH, and POODLE among others) is any lesson, one of the best ways to raise awareness about an attack is to give it a silly name. Therefore, I am (un)officially branding this particular attack as <em>Cache Rules Everything Around Me</em>.</p>
<p><a href="http://img.svbtle.com/q6lsehz7xtbtvw.gif" rel="nofollow"><img src="https://d23f6h5jpj26xu.cloudfront.net/q6lsehz7xtbtvw_small.gif" alt="gates-pie1.gif"></a></p>
<p>How can we avoid getting CREAMed? There’s only one option: we must close this side-channel and ensure any cryptographic operations we perform are constant-time. The enemy here is something known as a <em>data-dependent timing</em>: these are things like branching on secrets or doing address lookups based on secrets. Unfortunately, the way AES was designed, one of the main things we’d like to do is a table lookup to implement AES S-boxes. This makes it difficult to implement AES correctly in pure software.</p>
<p>Fortunately, Intel solved this problem… for the hyperspecific case of AES. Newer Intel CPUs (and also other vendors including ARM) now provide a fast, constant-time implementation of AES in hardware. That’s great for AES, but there’s a more general lesson to be drawn from CREAM.</p>
<p>Ideally, ciphers simply wouldn’t contain elements like S-boxes that are difficult to implement in constant time. Instead, ciphers would be designed with <a href="http://mechanical-sympathy.blogspot.com/" rel="nofollow">mechanical sympathy</a> in mind and be entirely composed of operations that are easy (or at least easier) to implement in constant time. And indeed, many modern ciphers, such as ChaCha20, are designed with this principle in mind.</p>
<p>However, good cipher design alone isn’t enough, and what should we do when we have to implement a cipher like AES in software? This is where things get rather tricky. Whenever someone with a crypto background gets a bit short about not rolling your own crypto, attacks like this are why.</p>
<p>First, there are rules we can follow to avoid timings that are dependent on secret data. The <a href="https://cryptocoding.net/index.php/Coding_rules" rel="nofollow">cryptocoding.net coding rules</a> describe some steps we can take to avoid these problems (in addition to some more general advice):</p>
<ol>
<li>Compare secret strings in constant time</li>
<li>Avoid branchings controlled by secret data</li>
<li>Avoid table look-ups indexed by secret data</li>
<li>Avoid secret-dependent loop bounds</li>
<li>Prevent compiler interference with security-critical operations</li>
<li>Prevent confusion between secure and insecure APIs</li>
<li>Avoid mixing security and abstraction levels of cryptographic primitives in the same API layer</li>
<li>Use unsigned bytes to represent binary data</li>
<li>Use separate types for secret and non-secret information</li>
<li>Use separate types for different types of information</li>
<li>Clean memory of secret data</li>
<li>Use strong randomness</li>
</ol>
<p>Okay great, we have a set of rules to follow, but is that enough? While rules are good guidelines, that’s all they are. Even if we try to follow them, what if we make mistakes? What if the CPU behaves in a way that we don’t expect? It would seem trying hard to follow the rules is necessary but not sufficient to implement software which is truly constant-time.</p>
<p>The next thing we need to do is measure. We can better understand the behavior of a particular CPU by measuring it empirically. However, where we might traditionally measure on the granularity of milliseconds or microseconds, for cryptography our measurements need to be more precise. Modern timing attacks combine a large number of samples with statistical analysis to extract even a tiny bit of signal from an otherwise noisy system. The image at the top of this post is taken from the <a href="http://www.isg.rhul.ac.uk/tls/TLStiming.pdf" rel="nofollow">Lucky 13 Attack</a> which was able to discern timing variability of as little as a microsecond measured over a noisy network full of random delays and jitter.</p>
<p>To measure with that degree of precision, we need to use the CPU cycle counters that are built into modern CPUs, such as the Time Stamp Counter (TSC) on Intel CPUs. These counters can give us a much more precise picture of what’s happening than the typical wall clock measurements you might be familiar with. How precise? On a 1GHz CPU, each cycle takes a nanosecond. Since modern CPUs are typically over 1GHz, a CPU cycle takes less than one nanosecond. If we’ve implemented a cipher correctly, then each time we use it, no matter what the values of the inputs are, it should always run in a precise number of clock cycles.</p>
<p>Dan Bernstein, the cryptographer who originally created the “CREAM” attack, produced a set of images that <a href="http://cr.yp.to/mac/variability1.html" rel="nofollow">visualize timing variability in various cryptographic implementations</a> when measured at the level of individual CPU cycles. Ideally, in these images, we would see a uniform grid, revealing no information:</p>
<p><a href="http://img.svbtle.com/t6zyrbfutovqng.gif" rel="nofollow"><img src="https://d23f6h5jpj26xu.cloudfront.net/t6zyrbfutovqng_small.gif" alt="v1-thoth-athlon-2.gif"></a> </p>
<p>Unfortunately, when measured this way, OpenSSL’s AES implementation was quite a bit less uniform-looking:</p>
<p><a href="http://img.svbtle.com/gjbzw2fqdsanjw.gif" rel="nofollow"><img src="https://d23f6h5jpj26xu.cloudfront.net/gjbzw2fqdsanjw_small.gif" alt="v1-thoth-openssl-2.gif"></a></p>
<p>Until we actually measure, it’s difficult to know if an implementation is <em>actually</em> constant time, and even then our measurements only apply to the microarchitecture of the CPU we measured on. If we’re trying to write portable code, we might discover that on a certain platform the compiler has discovered a “weird trick” which would normally make a program more efficient, but in a cryptographic context leaks information. To measure effectively, we need to do it on a wide range of CPUs.</p>
<p>So finally, how can we avoid <em>those</em> problems? Depending on the context, we might just have to knuckle down and implement some CPU-specific assembly. For example, the Galois Counter Mode (GCM), a popular AES “mode of operation” (and the one you should probably be using if you use AES), relies on something known as finite field multiplication which is difficult to implement in constant time on Intel CPUs. Fortunately, Intel added a set of CPU instructions known as CLMUL which can be used to implement GCM in both a fast and constant time manner.</p>
<p>And there we have it: if we want to stand any chance of a cryptographic implementation not getting CREAMed in the future, we need to follow the coding rules, measure our implementations on a wide variety of architectures, and use constant-time assembly implementations of certain primitives when needed.</p>
<p>Unless you do all of these things, watch out: you might get CREAMed.</p>
<p><em><sup>†</sup>SSL has technically been been renamed to Transport Layer Encryption (TLS) by the people who standardized it, despite the fact that it actually operates on the layers above the transport layer in the OSI network model. Not only did the people behind TLS confuse everyone by renaming it, but the new name inaccurately describes what the protocol does. What a mess.</em></p>
tag:tonyarcieri.com,2014:Post/whats-wrong-with-webcrypto2013-12-30T11:55:03-08:002013-12-30T11:55:03-08:00What's wrong with in-browser cryptography?<p><a href="http://img.svbtle.com/hvcrmeegvjrczw.jpg" rel="nofollow"><img src="https://d23f6h5jpj26xu.cloudfront.net/hvcrmeegvjrczw_small.jpg" alt="JSCryptoProblem.jpg"></a>
<sub><sup>Above image taken from Douglas Crockford’s <em><a href="https://www.youtube.com/watch?v=zKuFu19LgZA" rel="nofollow">Principles of Security</a></em> talk</sup></sub></p>
<p>If you’re reading this, then I hope that sometime somebody or some web site told you that doing cryptography in a web browser is a bad idea. You may have read “<a href="http://www.matasano.com/articles/javascript-cryptography/" rel="nofollow">JavaScript Cryptography Considered Harmful</a>”. You may have found it a bit dated and dismissed it.</p>
<p>You may have read about <a href="http://www.w3.org/TR/WebCryptoAPI/" rel="nofollow">WebCrypto</a> and what it hopes to bring to the browser ecosystem. This particular development may make you feel that it’s okay to start moving various forms of cryptography into the browser.</p>
<p>Why not put cryptography in the browser? Isn’t it inevitable? This is a perpetual refrain from various encryption products which target the browser (names and addresses intentionally omitted). While the smarter ones try to mitigate certain classes of attacks by shipping as browser extensions rather than just a web site that a user types into their address bar, there is definitely a push to a model where you can get the latest greatest crypto code by typing a friendly address into your URL bar.</p>
<p>What’s wrong with this? And will WebCrypto fix it? I don’t think so. Let’s look at the good, the bad, and the ugly of in-browser cryptography and the WebCrypto API.</p>
<h1>The Good: The Normative Parts</h1>
<p>Like many W3C standards, the normative parts of the specification are agnostic to specific algorithms. This is described in <a href="http://www.w3.org/2012/webcrypto/WebCryptoAPI/#algorithms" rel="nofollow">section 24.1 of the WebCrypto specification</a>:</p>
<blockquote class="large">
<p>This section is non-normative</p>
<p>As the API is meant to be extensible in order to keep up with future developments within cryptography and to provide flexibility, there are no strictly required algorithms. Thus users of this API should check to see what algorithms are currently recommended and supported by implementations.</p>
</blockquote>
<p>So, in fact, the W3C is not telling us what algorithms to use at all. Instead, the normative parts of the specification cover abstract APIs for things like generating secure random numbers, managing keys, encrypting/decrypting, backgrounding computation inside workers, and abstract types that can be used with a variety of algorithms.</p>
<p>In that regard, the normative parts of the specification are totally fine. While the spec doesn’t cover it, the APIs seem sufficiently abstract to allow them to easily map onto future encryption algorithms and trusted platform modules (TPMs) which could provide secure storage for encryption keys.</p>
<h1>The Bad: Failure to Provide Normative Advice on Algorithms</h1>
<p>The W3C has elected to make advice on algorithms a non-normative part of the specification. This leaves browser vendors without any specific standards upon which we can build an interoperable cryptographic ecosystem for the web. Instead, the <a href="http://www.w3.org/TR/WebCryptoAPI/#algorithms" rel="nofollow">section on algorithms</a> lists a bunch of examples of common algorithms and how they can be mapped onto WebCrypto’s APIs.</p>
<p>Browsers already ship portable versions of a large number of cryptographic algorithms as part of their TLS stacks. Without normative guidance from the WebCrypto specification itself, what is likely to happen is that browsers will expose the algorithms in their TLS stacks directly to the browser.</p>
<p>Some of them are fairly good (e.g. AES-GCM), but many of them are dangerous if used improperly, like pretty much any other symmetric cipher they list which is not AES-GCM, as these are not <a href="http://tonyarcieri.com/all-the-crypto-code-youve-ever-written-is-probably-broken/" rel="nofollow">authenticated encryption modes</a> and in the hands of amateurs are akin to handling plutonium.</p>
<p>Without someone providing normative advice that all browser vendors can adhere to, my worry is that the WebCrypto ecosystem will fragment and fail to agree on particular standards. My advice to the W3C is to <a href="http://blog.cryptographyengineering.com/2012/12/the-anatomy-of-bad-idea.html" rel="nofollow">listen to cryptography expert Matt Green’s advice</a> and provide a normative list of <em>authenticated</em> encryption algorithms (and <em>only</em> authenticated encryption algorithms) that all browsers should support. AES-GCM would be a good start.</p>
<h1>The Ugly: We’re Still In a Browser</h1>
<blockquote class="large">
<p>“The browser knows that the program does not represent the user” - Douglas Crockford</p>
</blockquote>
<p>There is no beating around the bush: the browser is a sandbox that attempts to let you dynamically download and run potentially malicious code from a server on-the-fly. Web browsers are a deliberately designed engine for remote code execution, a term which strikes fear into the hearts of information security professionals worldwide.</p>
<p>If ample precautions are taken (which includes a large laundry list of things like TLS, CSP, CORS, proper HTTP headers, JS strict mode, and more), this can allow for the successful development of cryptographic applications that attempt to enforce the interests of the web application creator. But what about the user?</p>
<p>Do programs in the browser represent the interests of the user? According to Commander Douglas Crockford (image at the top of this post) the answer is a resounding NO. This is not the traditional threat model of the browser. </p>
<p>Where installation of native code is increasingly restrained through the use of cryptographic signatures and software update systems which check multiple digital signatures to prevent compromise (not to mention the browser extension ecosystems which provide similar features), the web itself just grabs and implicitly trusts whatever files it happens to find on a given server at a given time.</p>
<p>The threat model of native code is now well-understood and increasingly addressed through more <a href="https://updateframework.com/" rel="nofollow">sophisticated software installation and update systems</a>. Native code releases are artifacts at a point in time, don’t change dynamically, and can therefore be audited and given approval by experts (who ideally have access to the source code and can <a href="http://it.slashdot.org/story/13/10/24/169257/how-i-compiled-truecrypt-for-windows-and-matched-the-official-binaries" rel="nofollow">match the official binaries</a>). This is not the case for the web platform.</p>
<p>The convenience of the web stems from the fact it’s a frictionless application delivery platform. Unfortunately, it does not rely on a comprehensive cryptographically secure signature system to determine content is authentic, but instead just trusts whatever is sitting around on the server at the time you access it. This is worsened by the fact that web browsers give remote servers access to wide-ranging local capabilities exposed via HTML and JavaScript. This creates an environment that is not particularly safe or stable for use in creating, storing, or sharing encryption keys or encrypted messages.</p>
<p>Before I keep talking about where in-browser cryptography is inappropriate, let me talk about where I think it might work: I think it has great potential uses for encrypting messages sent between a user and the web site they are accessing. For example, my former employer LivingSocial used in-browser crypto to <a href="https://www.braintreepayments.com/braintrust/client-side-encryption" rel="nofollow">encrypt credit card numbers in-browser with their payment processor’s public key before sending them over the wire</a> (via an HTTPS connection which effectively double-encrypted them). This provided end-to-end encryption between a user’s browser and the LivingSocial’s upstream payment gateway, even after HTTPS has been terminated by LivingSocial (i.e. all cardholder data seen by LivingSocial was encrypted).</p>
<p>In this approach, there’s an implicit trust relationship between the user and the site they’re accessing. What we see happening here is cryptography being used to protect the web site’s interests, <em>not</em> the user’s. For this purpose, in-browser crypto is great!</p>
<p>Where the web encryption model fails is when we want to provide a “Trust No One” service which protects the user’s interests, for example the MEGA storage service which uses in-browser crypto. In this sort of scenario, we have MEGA wanting to act as a sort of dumb store for encrypted data, and have them never see plaintexts or encryption keys. Such a service would, ideally, pass what cryptography expert Matt Green calls the “<a href="http://blog.cryptographyengineering.com/2012/04/icloud-who-holds-key.html" rel="nofollow">mud puddle test</a>”, where a person who has a particularly bad run-in with a mud puddle and loses their personal copies of encryption keys can’t ask the service to give them back, since the service itself doesn’t hold onto them.</p>
<p>However, this approach just doesn’t work in a browser, as illustrated by the <a href="http://nzkoz.github.io/MegaPWN/" rel="nofollow">MEGApwn utility</a> for obtaining your MEGA keys. This utility illustrates an important problem with building “Trust No One” services in the browser: anyone who can get JavaScript to run on the same origin as the alleged “Trust No One” service can get access to your encryption keys. WebCrypto’s mechanisms for secure key storage can mitigate this partially, but an attacker can still <a href="https://github.com/koto/mosquito" rel="nofollow">utilize your keys remotely</a>. Furthermore, MEGA was designed as a file sharing service, and for that to work it needs direct access to encryption keys so you can share them with other people.</p>
<p>MEGA has gone to great lengths to try to mitigate traditional XSS-style threats (making <a href="http://fail0verflow.com/blog/2013/megafail.html" rel="nofollow">several mistakes along the way</a> and earning Kim Dotcom the title of Security Charlatan of the Year at the <a href="https://www.youtube.com/watch?v=pIGejjv8Gt8#t=42m41s" rel="nofollow">2013 DEFCON Recognize Awards</a>), but no matter how hard they try this won’t change the fact that the security of the entire system is predicated on the security of MEGA’s JavaScript files at the time you happen to load their site (specifically the “SecureBoot.js” file in the case of MEGA).</p>
<p>The potential attacks are numerous: hackers (or governments) could compromise MEGA’s servers and change the file. A MEGA insider could place a malicious payload inside this file. Governments could coerce MEGA into placing a malicious payload inside the file. Or MEGA could just decide they want to grab everyone’s keys. If any of these things were to happen, the security of the entire system has been lost.</p>
<p>The web’s dynamic nature precludes our only defense against these sorts of attacks: audits by security experts. Even if crypto experts were to audit MEGA’s SecureBoot.js and give it a clean bill of health, there’s nothing to stop anyone who has sufficient access from injecting a malicious payload into it at any point in time. They could even selectively target users, so the rest of the world would still think it’s fine, but a particular victim would receive the malicious payload.</p>
<p>One way to mitigate this is to use browser extensions, which provide cryptographically signed software updates in a way more akin to traditional native code applications, helping mitigate the “just grab the latest code off the server any time I access the site” problem. Browser extensions have <a href="http://www.slideshare.net/kkotowicz/im-in-ur-browser-pwning-your-stuff-attacking-with-google-chrome-extensions" rel="nofollow">problems of their own</a>, but they do move the security bar forward over a traditional web page.</p>
<h1>HTTPS Doesn’t Solve This Problem</h1>
<p>Some of you might be thinking “if I use HTTPS, isn’t the content signed by the server?” It’s true that, after years of resolving mistakes and design flaws, and when the certificate you’re trusting hasn’t been compromised, HTTPS will ensure the integrity of the content between the remote web server and your web browser. Modern browsers support AES-GCM, which is particularly good at this.</p>
<p>However, HTTPS was designed to protect what’s known as “data-in-motion”. This means that HTTPS servers use online keys which are not only easily compromised, but they’re specifically designed to make it easy to send back data to users.</p>
<p>This means compromising a JavaScript file can involve little more than obtaining write access to a site’s static files (potentially through security vulnerabilities in a buggy web application) or obtaining CDN credentials (through similar channels). No key compromise is necessary to perform an attack, but since the keys are online the risk of key compromise is higher.</p>
<p>Better software update systems are specifically designed to encrypt “data-at-rest”, which is how build artifacts of native applications or even a web site’s static assets should be thought of. The advantage of data-at-rest is it can be signed by offline keys (or a combination of offline and online keys) which are much more difficult to compromise.</p>
<p>For more information on the problems of using HTTPS alone in the hopes of building a secure software delivery system, see section 4.1 “PKI Vulnerabilities” in the <a href="http://freehaven.net/%7Earma/tuf-ccs2010.pdf" rel="nofollow">Survivable Key Compromise In Software Update Systems</a> paper.</p>
<h1>Conclusion</h1>
<p>Cryptography is a systems problem, and the web is not a secure platform for application delivery. The web is a way to easily run untrusted code fetched from remote servers on-the-fly. Building security software inside of web browsers only makes the problem harder.</p>
<p>In-browser crypto is best utilized to help web sites protect their own interests. Sites attempting to build “Trust No One” cryptosystems inside of browsers (especially when not using browser extensions) have vast attack surface and are fundamentally attempting to use the browser for something it wasn’t designed for: creating software that respects the user’s interests, not the web site provider’s.</p>
<p>Instead, prefer either browser extensions or open source native tools. Look in particular for tools that have been audited by security professionals, and in the case of native code apps look for tools with binaries that can be reproduced from the original source code. Scrutiny by experts in paramount in making sure software is secure, and the web, as it exists today, makes this sort of scrutiny impossible.</p>
<p>For additional examples of the challenges of building a secure client-side JavaScript crypto application, check out Krzysztof Kotowicz’s “<a href="http://koto.github.io/blog-kotowicz-net-examples/keys-to-kingdom/" rel="nofollow">Keys to a Kingdom</a>” challenge. It’s a great illustration of the sorts of problems that can arise when buiding web-based encryption applications.</p>
tag:tonyarcieri.com,2014:Post/imperfect-forward-secrecy-the-coming-cryptocalypse2013-07-09T08:45:00-07:002013-07-09T08:45:00-07:00Imperfect Forward Secrecy: The Coming Cryptocalypse<p>If the Snowden debacle has accomplished anything, it’s raising public awareness in cryptography. This has spawned some sort of meme that <a href="http://blogs.computerworld.com/encryption/22366/can-nsa-see-through-encrypted-web-pages-maybe-so" rel="nofollow">if we use “perfect forward secrecy”, our communications are protected from the NSA</a>! Various blog posts and articles in major newspapers have conferred all sorts of magical secret powers onto perfect forward secrecy, specifically surrounding its NSA-fighting powers, going as far to say that <a href="http://www.washingtonpost.com/blogs/wonkblog/wp/2013/06/14/nsa-proof-encryption-exists-why-doesnt-anyone-use-it/" rel="nofollow">perfect forward secrecy is NSA-proof</a>.</p>
<p>First off, let me say that forward secrecy is great, and you should try to deploy it if you can. However, even if you run a stack that supports it (many hardware SSL terminators do not, for example), it’s still <a href="https://www.imperialviolet.org/2013/06/27/botchingpfs.html" rel="nofollow">pretty hard to implement properly</a> (you need to rotate session ticket keys frequently to gain anything from it, for example).</p>
<p>But let’s not beat around the bush: perfect forward secrecy isn’t some magical silver bullet against the NSA. To understand why we need to take a look at what it actually does: generate short-term public keys that are used to establish a shared secret between two parties, then sign these short-term keys with long-term keys that tie back into PKI certificate chains we can use to validate them. Since the long-term private keys are only used for creating digital signatures, and not encrypting the session, future compromises of the long-term keys will not affect the security of past sessions, since the past sessions were protected by the (randomly generated) short-term keys.</p>
<p>Forward secrecy comes from the fact that shortly after the session is established (sometimes days, sometimes minutes) the private keys are forgotten by all parties involved. If everyone does their due diligence to ensure all the private keys are destroyed, we are protected from future key compromises, because you can’t compromise what’s not there anymore. Since all parties involved have destroyed the private keys, it should be impossible to recover the session, hence the “perfect” part of perfect forward secrecy, right?</p>
<p>Wrong. There’s a problem: in order to use a short-term key to establish a session, we need to transmit it to another party in plaintext. This is, after all, a public key we use to set up the actual encrypted session, so at this point we don’t really have any way of encrypting it. Signing it takes care of ensuring the key is authentic, trusted, and otherwise unmanipulated by a malicious middleman, but we have no way to keep it confidential if we want to use it to set up a session with another party and we have no shared secrets.</p>
<p>So we send it in the clear. What’s wrong with that? Isn’t that what public key cryptography is all about? Well, there’s a problem: the NSA has just sniffed the short-term public key that was used to establish a particular session. If you are a Person of Interest, they could easily sniff all of the encrypted packets that make up a particular session, and not just for that session, they could repeat this for all of your Internet communications, including things like PGP encrypted emails (of the sort that <a href="http://www.huffingtonpost.com/2013/06/10/edward-snowden-glenn-greenwald_n_3416978.html" rel="nofollow">Edward Snowden sent to Glenn Greenwald</a>).</p>
<p>They are building a <a href="https://en.wikipedia.org/wiki/Utah_Data_Center" rel="nofollow">massive datacenter in Utah</a> that’s rumored to have an <a href="http://www.time.com/time/health/article/0,8599,1970849,00.html" rel="nofollow">asston of storage</a>, certainly enough for them to suck down all the Internet traffic of people they’re suspicious of, and they’ve got <a href="https://en.wikipedia.org/wiki/Room_641A" rel="nofollow">beam splitters giving them access to backbone Internet traffic in 10 major datacenters around the country</a>. If they find you suspicious enough they want to archive all your traffic, and you talk over anything they have tapped, they can easily download and store everything you do online.</p>
<p>If the NSA can break the public keys used for your sessions, and derive their corresponding private keys, it can decrypt all of your sessions, regardless of whether you used “perfect” forward secrecy.</p>
<p>How can the NSA break your public key and derive the corresponding private key? If the NSA wanted to break one of your HTTPS sessions with <a href="https://encrypted.google.com" rel="nofollow">https://encrypted.google.com</a>, which uses ECDHE for forward secrecy, they would need to break an ECDSA key associated with that session. To do that, they would need to have some way of solving the <a href="https://en.wikipedia.org/wiki/Discrete_logarithm" rel="nofollow">discrete logarithm problem</a>, the problem which makes public key cryptography hard in one direction, so deriving private keys from public ones is not practical, but easy in another, so calculating public keys from private ones is fast, particularly with the elliptic curve cryptography ECDHE uses.</p>
<p>ECDHE works great today, but for how long? Unfortunately, there’s doom on the horizon, not just for ECDHE, but for RSA, DSA, and practically all forms of public key cryptography we use today. That doom comes in the form of <a href="https://en.wikipedia.org/wiki/Quantum_computer" rel="nofollow">quantum computers</a>.</p>
<p>Quantum computers are odd beasts that need very specialized programs in order to work efficiently. However, one of the areas where they can be extremely effective is cryptography. <a href="https://en.wikipedia.org/wiki/Shor&#39;s_algorithm" rel="nofollow">Shor’s algorithm</a>, devised in 1994, can leverage the potential power of quantum computers to efficiently factor large numbers, and thus break public keys and derive the private ones. Unfortunately, at the time it was created, quantum computers didn’t exist.</p>
<p>Fast forward to today: quantum computing has been taking its first baby steps into practicality in the form of the <a href="http://www.slate.com/blogs/future_tense/2013/05/16/google_nasa_buy_d_wave_2_quantum_computer_what_will_they_do_with_it.html" rel="nofollow">512-qubit D-Wave 2</a> (where a <a href="https://en.wikipedia.org/wiki/Qubit" rel="nofollow">qubit</a> is the quantum analogue of a bit). Some of the first commercial quantum computers were just sold to Google and NASA. You might relax for now, because the D-Wave 2 doesn’t come close to having what it takes to challenge modern public key cryptography, which would require several orders of magnitude more qubits. But it was only last year that <a href="http://www.technologyreview.com/view/426586/worlds-largest-quantum-computation-uses-84-qubits/" rel="nofollow">an 84-qubit computer was making headlines</a>.</p>
<p>We’ll see if there’s a Moore’s Law of quantum computing or not, but given past trends it seems like a reasonable wager that quantum computers will continue to get more powerful, and perhaps exponentially so like we saw with transistor-based computers. D-Wave’s (undoubtedly optimistic) CEO claims <a href="http://www.nature.com/news/computing-the-quantum-company-1.13212" rel="nofollow">“In 10 years’ time, I’d be hugely disappointed if we didn’t have a machine capable of factoring a 1,000-bit number, involving millions of qubits”</a>.</p>
<p>What happens in 10-20 years (or what have you) when quantum computers, like their classical predecessors, are no longer confined to the labs of NASA (or the NSA), and do start reaching the millions of qubits range at which they become practically useful for breaking public keys, at least the kind we use today? When this happens, it means that <em>anyone</em> (not just the NSA!) who can get their hands on a quantum computer, and has captured one of your HTTPS sessions, can break your session and recover the plaintext. Perfect forward secrecy be damned!</p>
<p>When that happens, we’ll have entered the age of <a href="http://pqcrypto.org/" rel="nofollow">post-quantum cryptography</a>:</p>
<blockquote>
<p>Imagine that it’s fifteen years from now. Somebody announces that he’s built a large quantum computer. RSA is dead. DSA is dead. Elliptic curves, hyperelliptic curves, class groups, whatever, dead, dead, dead. So users are going to run around screaming and say “Oh my God, what do we do?” Well, we still have secret-key cryptography, and we still have some public-key systems. There’s hash trees. There’s NTRU. There’s McEliece. There’s multivariate-quadratic systems. But we need more experience with these. We need algorithms. We need paddings, like OAEP. We need protocols. We need software, working software for these systems. We need speedups.</p>
</blockquote>
<p>In order to defeat quantum computers, we will need to switch to a new class of algorithms which are at least <a href="https://en.wikipedia.org/wiki/NP-hard" rel="nofollow">NP-hard</a>, such as <a href="https://en.wikipedia.org/wiki/McEliece_cryptosystem" rel="nofollow">McEliece (which is based on linear codes)</a> or <a href="https://en.wikipedia.org/wiki/Lamport_signature" rel="nofollow">Lamport Signatures (which are based on hash functions)</a>. Switching to these schemes today has a number of drawbacks: no good implementations, giant keys, and in the case of Lamport Signatures a single-use problem.</p>
<p>The good news is that <a href="http://cr.yp.to/" rel="nofollow">Dan Bernstein (owner of the cr.yp.to domain, what other credentials do you need?)</a> has been working on a practical post-quantum public key encryption system called <a href="http://binary.cr.yp.to/mcbits-20130616.pdf" rel="nofollow">McBits</a>. More good news: quantum computers suck at breaking symmetric encryption, great if you just want to encrypt something with a password or have shared it with someone in advance, but unfortunately most of the time symmetric keys are established using the kinds of public key algorithms that quantum computers can destroy.</p>
<p>If we really want to keep our communications secure from the NSA, what do we need to do? All this talk about post-quantum cryptography might scare you, and maybe we do need to jump on better algorithms soon, but there are way bigger problems that might give the NSA access to your data today, like <a href="http://arstechnica.com/security/2013/06/guardian-reporter-delayed-e-mailing-nsa-source-because-crypto-is-a-pain/" rel="nofollow">sending email in plaintext because encryption apps are too damn hard to use</a>, or the fact that computer systems are riddled with all sorts of vulnerabilities (<a href="http://arstechnica.com/security/2013/06/nsa-gets-early-access-to-zero-day-data-from-microsoft-others/" rel="nofollow">which companies like Microsoft handed over to the NSA before they were made public, which could potentially be used to break into the computers of Persons of Interest</a>). Cryptography won’t help you if someone can break into your computer and steal all the plaintexts right out of your computer’s RAM.</p>
<p>All that said, does truly perfect, NSA-proof encryption exist? Yes, it’s called the <a href="https://en.wikipedia.org/wiki/One-time_pad" rel="nofollow">one-time pad</a>, in which data is encrypted with a single-use pad of the same size by performing an XOR operation. OTP is the only truly perfect encryption system, and with the continuing growth in capacity of things like USB keychain drives, OTP is a great practical choice if you want to have truly confidential (so long as you have truly random numbers), futureproof (so long as you don’t lose or reuse your keys!) encrypted conversations with anyone you can swap USB keychains with in person. Reuse the pad though, and it’s all over!</p>
tag:tonyarcieri.com,2014:Post/the-cloud-isnt-dead-it-just-needs-to-evolve2013-06-11T09:32:39-07:002013-06-11T09:32:39-07:00The cloud isn't dead. It just needs to evolve<p>At my previous job my daily commute took me past 611 Folsom Street in San Francisco. This building is infamous for being the home of <a href="http://en.wikipedia.org/wiki/Room_641A" rel="nofollow">Room 641A</a>, where a whistleblower, Mark Klein, revealed that the NSA had created a secret room and placed beam splitters on fiber optic cables carrying Internet backbone traffic.</p>
<p>I heard Jacob Appelbaum mention this building’s street addess in his <a href="http://www.youtube.com/watch?v=QNsePZj_Yks" rel="nofollow">29c3 keynote</a>, along with the <a href="http://en.wikipedia.org/wiki/Utah_Data_Center" rel="nofollow">NSA’s Utah Datacenter</a>, where wiretapped traffic from this room was allegedly funneled and stored, a challenge that sounds daunting but according to NSA whistleblower William Binney, the datacenter has a “yottabyte” scale capacity intended to archive all the traffic they can collect for up to 100 years. From then on I got a little bit preoccupied with this building every time I walked past it on my daily commute.</p>
<p>It’s a building I took pictures of and tweeted about:, like this picture from February:</p>
<p><a href="http://t.co/aZmsr3lNsL" rel="nofollow"><img src="http://distilleryimage4.ak.instagram.com/95f937647bd411e2bb8a22000a1f97fc_7.jpg" alt="611 Folsom"></a></p>
<p><img src="http://img.svbtle.com/bascule_24676670864358.png" alt="Tweet"></p>
<p>The idea of an adversary eavesdropping your traffic is central to cryptography. Cryptographers generally bestow the name “Eve” on this adversary, although the EFF has recently abandoned this pretense and has started producing diagrams which <a href="https://www.eff.org/pages/tor-and-https" rel="nofollow">identify this adversary as the NSA</a>. For the sort of beam-splitter based observe-but-do-not-interfere style traffic snooping the NSA is performing, the NSA is pretty much acting as the textbook example of “Eve”. However, given the vast capacity of the Utah Data Center, the NSA has the ability to be an incredibly patient Eve, and can leverage all sorts of things like future key compromises and even future algorithm compromises (on a 100 year timespan) to break older traffic dumps.</p>
<p>I’m the sort of person who spends far too much of my day aimlessly thinking about cryptography, and walking past 611 Folsom Street every day it couldn’t help but set my thoughts towards how to defeat the NSA with better cryptography.</p>
<p>The idea of trying to out-crypto a state level adversary with seemingly boundless funding, resources, and expert personnel on their hands might seem a little absurd, and it should be. For an expert look at the problem of whether we can defeat a state-level adversary with cryptographic applications, we can look to Matt Green’s blog post <a href="http://blog.cryptographyengineering.com/2013/03/here-come-encryption-apps.html" rel="nofollow">Here come the encryption apps!</a>. Matt does a “Should I use this to fight my oppressive regime?” evaluation of several cryptographic applications and determines that only one of them, RedPhone by Moxie Marlinspike’s Whisper Systems, fits the bill.</p>
<p>Securing our Internet traffic from an “Eve” like the NSA is a daunting challenge. But it’s one I feel it’s worth working on…</p>
<h1>Trust No One</h1>
<p>“The Cloud” as a concept has somewhat… fluffy security properties (please pardon the pun). We can encrypt data in the cloud, but encryption in and of itself isn’t particularly helpful. It’s particularly problematic if we trust the same people to store our personal data encrypted and also trust them to hold the encryption keys to that data. If they’re doing both, they can decrypt our personal ciphertexts on a whim.</p>
<p>Matt Green defines a “<a href="http://blog.cryptographyengineering.com/2012/04/icloud-who-holds-key.html" rel="nofollow">mud puddle test</a>” for cloud services: let’s say you slip in a mud puddle, which destroys both the brain cells which store a password to your data, and a backup of your password you kept on your phone when the water seeps in and fries its circuit board. Can you still access your data somehow? If so, congratulations, you have failed the mud puddle test.</p>
<p>What’s the problem? In order to get access to your data again, someone else must’ve held onto your key, and you are therefore trusting that someone with the security of your data. Many services today might encrypt your data server-side using a key they know (and perhaps only they know!). Your data may reach their service encrypted over SSL, but they’re terminating the SSL on their end and are able to see all of your plaintext as it passes through their servers. If a state level adversary like the NSA has wormed their way in and requested a backdoor, then your cloud provider is able to tell the NSA anything they want.</p>
<p>The solution to this problem is to encrypt everything client side using a key which is only known to the owner of the data. Before the data ever leaves a single computer, it needs to be encrypted with a key known only by the data’s owner. None of this “decrypt it server-side then re-encrypt it” business will do, nor will storing unencrypted keys with a third party. If you want to ensure your data remains confidential, it must be encrypted with a key known only to you.</p>
<p>Without the guarantee that you are the one and only one holder of the encryption key to a particular piece of data, you have absolutely no assurances that your data is being kept confidential, and that it is not being observed by the NSA, or for that matter… the entire rest of the world.</p>
<p><a href="http://img.svbtle.com/bascule_24676722211212.png" rel="nofollow"><img src="https://d23f6h5jpj26xu.cloudfront.net/bascule_24676722211212_small.png" alt="You Don&#39;t Understand The Internet"></a></p>
<h1>Secure Cloud Storage</h1>
<p>What does “the Cloud” really mean? Who knows. But as far as I can tell, one of the things it does is store data: your personal data, others’ personal data, businesses’ data. Sensitive data of all sorts. How much of this is stored in plaintext, or with keys held directly by the cloud provider?</p>
<p>We are at the whims of whatever cloud services we use when it comes to how our data is stored, but when you understand the “mud puddle test”, it becomes quite clear that where your data is stored is irrelevant, because if you encrypt data before putting it in the cloud you hold all the keys and your data is protected by cryptography. Any country-specific government snooping should be irrelevant if you can trust cryptography to keep your data secure. There shouldn’t be any worries in trusting nodes in Geneva, San Francisco, Moscow, Beijing, or Tehran. We should be able to rely on cryptography to keep our data secure.</p>
<p>You’d think by now distributed secret storage and sharing would be a solved problem. Unfortunately no one system for generally solving this problem has ever gained traction. This is something I have personally been longing for since I was a high school student playing around with [MojoNation](<a href="http://en.wikipedia.org/wiki/Mnet_(peer-to-peer_network)" rel="nofollow">http://en.wikipedia.org/wiki/Mnet_(peer-to-peer_network)</a>, and I would later discover the concept existed much earlier in <a href="http://en.wikipedia.org/wiki/Project_Xanadu" rel="nofollow">Project Xanadu</a>, which dates back to the 1960s. Some of the 17 rules of Project Xanadu seem relevant to me today:</p>
<ol>
<li>Every Xanadu server is uniquely and securely identified.</li>
<li>Every Xanadu server can be operated independently or in a network.</li>
<li>Every user is uniquely and securely identified.</li>
<li>Every user can search, retrieve, create and store documents.</li>
<li>Every document can consist of any number of parts each of which may be of any data type.</li>
<li>Every document can contain links of any type including virtual copies (“transclusions”) to any other document in the system accessible to its owner.</li>
<li>Links are visible and can be followed from all endpoints.</li>
<li>Permission to link to a document is explicitly granted by the act of publication.</li>
<li>Every document can contain a royalty mechanism at any desired degree of granularity to ensure payment on any portion accessed, including virtual copies (“transclusions”) of all or part of the document.</li>
<li>Every document is uniquely and securely identified.</li>
<li>Every document can have secure access controls.</li>
<li>Every document can be rapidly searched, stored and retrieved without user knowledge of where it is physically stored.</li>
<li>Every document is automatically moved to physical storage appropriate to its frequency of access from any given location.</li>
<li>Every document is automatically stored redundantly to maintain availability even in case of a disaster.</li>
<li>Every Xanadu service provider can charge their users at any rate they choose for the storage, retrieval and publishing of documents.</li>
<li>Every transaction is secure and auditable only by the parties to that transaction.</li>
<li>The Xanadu client-server communication protocol is an openly published standard. Third-party software development and integration is encouraged.</li>
</ol>
<p>These are some rather lofty goals, hardly any of which are met by the hypertext system we now use today instead of Xanadu: the web. I believe Xanadu’s original goals are worth revisiting, and that creating a world-scale <a href="http://en.wikipedia.org/wiki/Content-addressable_storage" rel="nofollow">content addressable storage</a> system which is secure, encrypted, and robust is worth pursuing.</p>
<h1>Who is working on this problem?</h1>
<p>Perhaps Xanadu’s scope was too vast, and while it’s goals were admirable, its complexity doomed it to vaporware. Indeed, Xanadu may be the biggest vaporware project in history. There are, however, many people working on limited subsets of the Xanadu problem, attempting to build secure distributed document storage systems. Here are some of the ones I find interesting:</p>
<ul>
<li>
<a href="https://tahoe-lafs.org/trac/tahoe-lafs" rel="nofollow">Tahoe-LAFS</a>: my favorite on the list, Tahoe applies a <a href="http://en.wikipedia.org/wiki/Capability-based_security" rel="nofollow">capability-based security</a> model to the problem of cloud storage, permitting the ability to provide extremely granular access on a read-write, read, or verify level to individual files, directories, and subtrees in a distributed filesystem. I recently contributed some UI improvements to Tahoe and would strongly suggest you check it out. Unfortunately, Tahoe has failed to garner any sort of mainstream traction.</li>
<li>
<a href="https://freenetproject.org/" rel="nofollow">Freenet</a>: a perpetual contender in this space, FreeNet has somewhat similar goals to Tahoe, but on a global scale. Unfortunately, FreeNet has failed to solve problems around accounting, and generally has problems around performance and reliability.</li>
<li>
<a href="http://en.wikipedia.org/wiki/Perfect_Dark_(P2P):%20a%20closed-source%20Japanese%20system,%20Perfect%20Dark%20is%20attempting%20a" title="defense in depth&quot; (otherwise known as &quot;security by obscurity" rel="nofollow">Perfect Dark</a> approach towards frustrating those who would try to undermine its crypto. It boasts a number of impressive features such as mixnets for improving anonymity, but without the ability to audit its source code it’s questionable as to how cryptographically secure it actually is.</li>
<li>
<a href="https://github.com/cryptosphere/cryptosphere" rel="nofollow">The Cryptosphere</a>: my own personal vaporware solution to this problem. It’s received a <a href="http://techcrunch.com/2012/07/31/new-darknet-wants-to-match-up-cypherpunks-in-crypto-utopia/" rel="nofollow">little bit of coverage on TechCrunch</a> despite being largely vaporware, but in the wake of the recent NSA scandals finally dragging their activities into the spotlight of the public consciousness, I have newfound motivation to continue working on it.</li>
</ul>
<h1>Why create something new?</h1>
<p><img src="http://img.svbtle.com/bascule_24676761113730.png" alt="Standards"></p>
<p>I’ll admit that, to a certain degree, the Cryptosphere feels like reinventing the wheel. There are several projects out there that more or less do most of the things I’d like for the Cryptosphere to do. So why bother?</p>
<p>I’d like to take a different approach with the Cryptosphere, one which to my knowledge hasn’t been tried yet, in hopes that I can perhaps produce something meaningful to your average person. I feel like the problem with existing P2P systems like I mentioned above is that their value proposition is difficult for your average person to understand. That was <a href="https://github.com/tarcieri/distribustream" rel="nofollow">certainly the case for the last P2P system I tried to write</a>.</p>
<p>I’d like to try to build the secure web from the top down. If you <a href="https://github.com/cryptosphere/cryptosphere" rel="nofollow">look at the existing codebase</a> (which is, admittedly, rather small at this point) this isn’t really what I’ve been doing, as I’ve been spiking out the primitives for encrypted storage. Once these are in place though, I’d like to build a system whose frontend is a web browser, but whose backend runs locally on the same machine, performing encryption and P2P activities. </p>
<p>The end result, in my mind, is a system that allows people to write secure HTML/JS applications able to tap into a rich cryptographic backend running locally on the same machine, a backend which could hopefully provide many of the things people presently run their own servers for today. This is an idea born out of <a href="http://techcrunch.com/2011/09/01/strobe-launches-game-changing-html5-app-platform/" rel="nofollow">Strobe</a>, a startup I used to work for: with the right set of canned backend services and a way to securely deploy and manage HTML/JS applications, you can author an application which lives entirely in the browser.</p>
<p>It’s ambitious, but it’s an idea I have been <a href="http://lists.zooko.com/pipermail/p2p-hackers/2011-August/002979.html" rel="nofollow">thinking about for quite some time</a>. If this sounds interesting to you, please join the <a href="https://groups.google.com/group/cryptosphere" rel="nofollow">Google Group</a> or <a href="https://twitter.com/thecryptosphere" rel="nofollow">follow us on Twitter</a>.</p>
tag:tonyarcieri.com,2014:Post/lets-figure-out-a-way-to-start-signing-rubygems2013-02-01T09:00:00-08:002013-02-01T09:00:00-08:00Let's figure out a way to start signing RubyGems<p>Digital signatures are a passion of mine (as is infosec in general). Signatures are an integral part of my cryptosystem <a href="https://github.com/livingsocial/keyspace" rel="nofollow">Keyspace</a> for which I wrote the <a href="https://github.com/tarcieri/red25519" rel="nofollow">red25519</a> gem. The red25519 gem’s sole purpose was to expose the state-of-the-art <a href="http://ed25519.cr.yp.to/" rel="nofollow">Ed25519 digital signature algorithm</a> in Ruby. I have since moved on to implementing Ed25519 in the much more comprehensive <a href="https://github.com/cryptosphere/rbnacl" rel="nofollow">RbNaCl</a> gem. Point being, I have longed for a modern, secure digital signature system in Ruby and have been working hard to make that a reality.</p>
<p>Digital signatures are something I think about almost every single day, and that’s probably fairly unusual for a Rubyist. That said, if you do work with Ruby, you have hopefully been informed that <a href="http://news.ycombinator.com/item?id=5139583" rel="nofollow">RubyGems.org was compromised</a> and that the integrity of all gems it hosted is now in question. Because of this, RubyGems.org is down. As someone who thinks about digital signatures every day, I have certainly thought quite a bit about how digital signatures could’ve helped this situation.</p>
<p>And then I talk to people… not just one person, but several people, who make statements like “signing gems wouldn’t have helped here”.</p>
<p>These people are wrong. They are so wrong. They are so so very wrong. They are so so very very wrong I can’t put it in words, so have a double facepalm:</p>
<p><a href="http://img.svbtle.com/bascule_24474456048384_raw.png" rel="nofollow"><img src="https://d23f6h5jpj26xu.cloudfront.net/inline_bascule_24474456048384_raw.png" alt="double_facepalm.png"></a></p>
<p>These people are so wrong I’m not even going to bother talking about how wrong they are. Suffice it to say that I think about digital signatures every day, and people making this claim probably don’t (if you do, I’d love to hear from you), but I can also provide an analogy about what they’re saying:</p>
<p>The lock which secures the door of your home can be picked. It doesn’t matter how expensive a lock you buy, someone can pick it. You can buy a top-of-the-line Medeco lock. It doesn’t matter. Just go to Defcon, and watch some of the top lockpickers in the world open 10 Medeco locks in a row. Because locks are pickable, they are pointless, therefore we shouldn’t put any locks on our doors at all.</p>
<p>Does that sound insane to you? I accept the truth that locks are easily pickable, but I certainly want locks on anything and everything I own. Putting locks on things you want to secure is common sense, and part of a strategy known as <a href="http://bit.ly/euToZF" rel="nofollow">defense in depth</a>. If someone tells you not to put a lock on your door just because locks are pickable, they’re not really a trustworthy source of security advice. We should start signing gems with the hope that signing gems gains enough traction for the process to be useful to everyone, not give up in despair that signing gems is a fruitless endeavor. If you disagree, I wonder how willingly you’d part with the easily pickable deadbolt on your front door. </p>
<p>I can’t offer a perfect system with unpickable locks, but I think we can practically deploy a system which requires few if any changes to RubyGems, and I also have suggestions about things we can do to advance the state-of-the-art in RubyGems security. So rather than trying to micro-analyze the arguments of people who say digital signatures don’t work (which, IMO, are flimsy at best), let’s just jump into my plan and let me tell you how they could work.</p>
<h2>Understanding the Problems</h2>
<p>I became passionate about signing RubyGems about a year ago. I even had a plan. It was a plan I wasn’t sure was entirely a good idea at the time, but now I certainly think it’s a good idea. I talked with several people, including RubyGems maintainers like Eric Hodel about it, and I can’t say he was thrilled about my idea but he thought it was at least good enough not to tell me no. Now that the RubyGems.org hack has happened, I think it’s time to revisit the idea.</p>
<p>First, let’s talk about what we’d like to accomplish by signing gems. Let’s start with a hypothetical (but not really!) scenario: let’s say you want to add some gems to a Ruby app, and RubyGems.org has been compromised (you know, like what just happened). Let’s say RubyGems.org has been compromised, and someone has uploaded a malicious version of some obscure Rails dependency like “hike”. You probably aren’t thinking to look at the source code of the “hike” gem, are you? In fact you probably have no idea what the hike gem is (that’s okay, before today neither did I).</p>
<p>Let’s say someone has done all this, and RubyGems.org doesn’t even know this has happened yet. Yet you’re doing “bundle update rails”, possibly in response to the very security vulnerabilities which lead RubyGems.org to be hacked in the first place. After doing this, bundler has just downloaded the compromised version of the “hike” gem, and you are completely unaware.
You now deploy your app with the compromised “hike” gem (along with anyone else who has upgraded Rails during the window in which RubyGems.org has been compromised) and now your app contains a malicious payload.</p>
<p>Does a malicious payload scare you? It should. Do you even know what power gems have over your system? Even gem installation gives a malicious gem maker wide-ranging control over your system, because gemspecs take either the form of arbitrary Ruby code or YAML files which are known to have a code execution vulnerability.</p>
<p>Need some concrete examples? Check out some of the gems Benjamin Smith wrote last year. <a href="https://github.com/benjaminleesmith/be_truthy" rel="nofollow">Here’s a gem that will hijack the sudo command, steal your password, then use it to create a new administrative user, enable SSH, and notify a Heroku app of the compromise</a>. And here’s another gem he made which <a href="https://github.com/benjaminleesmith/better_date_to_s/blob/1f855de5483668bd74f97c33aa7d09c9318cc6f6/lib/better_date_to_s/better_date_to_s.c" rel="nofollow">uses a native extension to copy your entire app’s source code into the public directory</a> then notifies a Heroku app so an attacker can download it. Malicious gems have wide-ranging powers and can do some really scary stuff!</p>
<p>Whose fault is it your app is now running a malicious payload? Is it RubyGems fault for getting hacked in the first place? Or is it your fault for putting code into production without auditing it first?</p>
<p>Here’s my conjecture: RubyGems is going to get hacked. I mean, it already did. We should just anticipate that’s going to happen and design the actual RubyGems software in such a way that it doesn’t matter if RubyGems gets hacked, we can detect modified code and prevent it from ever being loaded within our applications. This isn’t quite what RubyGems provides today with its signature system (I sure wish RubyGems would verify gems very, very first thing before it does anything else!) but it’s close, and it’s a goal we should work towards.</p>
<p>There’s a more general concept around the idea that a site like RubyGems, which stores a valuable resource like Ruby libraries, can get hacked but an attacker cannot confuse you into loading malicious code, because we have cryptosystems in place to detect and reject forgeries. That idea is the <a href="http://en.wikipedia.org/wiki/Principle_of_least_privilege" rel="nofollow">Principle of Least Authority</a>. Simply speaking the principle of least authority says that to build secure systems, we must give each part as little power as possible. I think it is unwise to rely on RubyGems to deliver us untainted gems. That’s not to say those guys aren’t doing a great job, it’s just that it’s inevitable that they will get hacked (as has been demonstrated empirically).</p>
<p>A dream system, built around digital signatures, should ensure that there’s no way someone could forge gems (obligatory RubyForge joke here) for any particular project without compromising that project’s private key. Unfortunately RubyGems does not presently support project-level signatures. That’s something I’ll talk about later. But first, let’s talk about what RubyGems already has, and how that is already useful to the immediate situation.</p>
<p>RubyGems supports a signature system which relies on RSA encryption of a SHA1 hash, which is more or less RSAENC(privkey, SHA1(gem)). This isn’t a “proper” digital signature algorithm but is fairly similar to systems seen in <a href="http://eprint.iacr.org/2012/524.pdf" rel="nofollow">Tahoe: The Least Authority Filesystem</a> and SSH. RubyGems can maintain a certificate registry and check if all gems are signed, and prevent the system from starting in the event there are unsigned gems.</p>
<p>It’s close to what’s needed, and would provide quite a bit even in its current state. For example, let’s say RubyGems had digital signature support and also has trusted offsite backups of their database that they know aren’t compromised. They’re able to restore a backup of their database, and from that restore their own trust model of who owns what gems. Just in case they could ask everyone to reset their password and upload their public keys. Perhaps we could keep track of people whose public keys changed during the reset process and flag them for further scrutiny, just in case an attacker was trying to compromise the system via this whole system-reset process.</p>
<p>Once this has happened, RubyGems could then verify gems against the public keys of their owners. This would allow RubyGems to automatically verify many gems, and quarantine those which can’t be checked against the owners’ certificates. This is relatively easy-to-automate and could’ve gotten RubyGems back online with a limited set of cryptographically verifiable gems in much less time than it’s taken to date.</p>
<p>This is all well and good, but what happens if an attacker manages to forge a certificate which RubyGems accepts as legitimate? RubyGems needs some kind of trust root to authenticate certificates against. If it had such a trust root, it could compare things like email address on RubyGems.org accounts (which it could double check via an email confirmation) against the one listed in the certificate to at least ensure a given certificate was valid for a given email address. It could also look at a timestamp and ensure that timestamp was significantly prior to a given attack. But to do any of that it would need a trust root which isn’t RubyGems itself.</p>
<h1>The Identity Problem: Solving Zooko’s Triangle</h1>
<p>Who can RubyGems trust if not RubyGems? Is there a way to build a distributed trust system where there is no central authority? Can we rely on a web-of-trust model instead?</p>
<p>The answer is: kind of in theory, probably not in practice. We could allow everyone who wants to publish a gem to sign it with their private key. Their private key could include their name and cryptographic proof that they at least claim that’s their name. Let’s call this person Alice.</p>
<p>Is it a name we know? Perhaps it is! Perhaps we met Alice at a conference. Perhaps we thought she was really cool and trustworthy. Awesome! A name we know. But is Alice really Alice? Or is “Alice” actually a malicious <a href="http://research.microsoft.com/pubs/74220/IPTPS2002.pdf" rel="nofollow">Sybil</a> (false identity, a.k.a. sock puppet) pretending to be the real Alice?</p>
<p>We don’t know, but perhaps we can Google around for Alice and attempt to find her information on the intarwebs. Hey, here’s her blog, her Twitter, etc. all with her name and recent activity that seems plausible.</p>
<p>If we really want to be certain Alice’s public key is authentic, now we need to contact her. You’ve found her Twitter, so perhaps you can tweet at her “Hey Alice, is this your key fingerprint?” Alice may respond yes, but perhaps the attacker has stolen her phone and thus compromised Alice’s Twitter account. (Silly Alice, should’ve used full disk encryption on your phone ;)</p>
<p>To really do our due diligence, perhaps we can try to authenticate Alice through multiple channels. Her Twitter, her email, her Github, etc. But it sure would be annoying to Alice if everyone who ever wanted to use Alice’s gems had to pester Alice across so many channels just to make sure Alice’s signature belongs to the real Alice.</p>
<p>It sure would be nice if we could centralize this identity verification process, and have a cryptographically verifiable certificate which states (until revoked, see below) that Alice really is Alice and this really is her private key, and this really is her email address, and this really is her Github, and so on.</p>
<p>Is there any option but centralization in this case? Not really… we’ve run afoul of Zooko’s triangle:</p>
<p><a href="http://img.svbtle.com/bascule_24474612470688_raw.png" rel="nofollow"><img src="https://d23f6h5jpj26xu.cloudfront.net/inline_bascule_24474612470688_raw.png" alt="Screen Shot 2013-01-31 at 10.37.35 PM.png"></a></p>
<p>Modeling identity is a hard problem. We can attempt to tie identities to easily memorable, low-entropy things like names or email addresses, but doing that securely is rather difficult as I hope I’ve just illustrated. We could also attempt to tie identities to hard-to-remember things like large numbers resulting from irreversible transformations of other large random numbers, such that computing their inverse mathematically is an intractable problem such as computing a discrete logarithm (or at least it’s a hard problem until <a href="http://pqcrypto.org/" rel="nofollow">quantum cryptography happens</a>).</p>
<p>If we choose to identify people by random-looking numbers, we don’t need to have any type of trust root. Per the diagram of Zooko’s triangle above, random-looking numbers can be used as the basis of a “global” system where we have no central trust authority.</p>
<p>But there’s a problem: random-looking numbers give us no information about whether or not we trust a particular person who holds a particular public key. We know absolutely nothing from random-looking numbers. So we’re back to chasing down the person whose name appears in a certificate and asking them if we really have the right number.</p>
<p>To really have a practical system where we can centralize and automate the process of tying people’s random-looking numbers to the memorable parts of their identity like names and email addresses, we can’t have a global system. We need a centralized one. We need someone whom we can trust to delegate this authority to, who can be diligent about at least attempting to verify people’s identities and issue signed certificates which are not easily obtainable and cannot be actively attacked due to manual portions of a process and also human-level scrutiny of the inputs.</p>
<p>This is the basic idea of a certificate authority. In general the idea has probably lost favor in recent years, with the <a href="http://notary.icsi.berkeley.edu/trust-tree/" rel="nofollow">HTTPS CA system fragmenting into an incomprehensibly complex quagmire of trust relationships</a>. But I’m not proposing anything nearly that complex.</p>
<p>I propose someone steps up and runs some kind of certificate authority/trust authority for RubyGems. You may be thinking that RubyGems itself is best equipped to do this sort of thing, but I think there would be value in having some 3rd party specifically interested in security responsible for this endeavor. At the very least, as a different organization running what is hopefully a different codebase on a different infrastructure, there’s some defense in depth as far as not having all your eggs in one basket. Separating the trust authority from the gem monger makes for two targets instead of one.</p>
<p>I’m not saying this CA should be run like practically any other CA in existence. It should be run by volunteers and provided free-of-charge. It should be unobtrusive, relying on fragments of people’s online identities to piece together their personhood and gauge whether or not they should be issued a certificate. It should be noob-friendly, and not require that people are particularly well-known or have a large body of published software, but should still perform due diligence in ascertaining someone’s identity.</p>
<p>I don’t think any system which relies on unauthenticated third party certificates can provide any reasonable kind of security. I think, at the very least, a CA is needed to impede the progress of the malicious.</p>
<h1>PKI sucks! Ruby security is unfixable!</h1>
<p>Yes, PKI sucks. There’s all sorts of things that can go wrong with a CA. People can hack the CA and steal their private key! If this happens, you probably shouldn’t be running a CA. If a CA relies on a manual review process, people can also social engineer the CA to obtain certificates to be used maliciously.</p>
<p>This is particularly easy if the CA doesn’t require much in the way of identity verification (which is required if you’re trying to be noob-friendly). Perhaps they’ve already compromised the trust vectors a CA like this would use to review certificates.</p>
<p>These are smaller issues compared to the elephant in the room: If I were to run a CA, you’d have to trust me. Do you trust me? I talk a lot of shit on Twitter. If you judge me solely by that, you probably don’t trust me. But would you trust me to run a CA?</p>
<p>All that said, I’m trying to write security software in Ruby, and I feel like thanks to all the recent security turmoil, Ruby has probably gained a pretty bad reputation security-wise. Worse, it’s not just Ruby’s reputation that’s at stake, these are real problems and they need to be fixed before Ruby can provide a secure platform for trusted software.</p>
<p>I want to write security software in Ruby. I do not feel doing this is a good idea if Ruby’s security story is a joke. So I want to help fix the problem. I feel some type of reasonable trust root is needed to make this system work, and I can vouch for my own paranoia when it comes to detecting malicious behavior. I’d be happy to attempt to run a CA on behalf of the Ruby community, and potentially others, or I’d be happy to help design open source software that can be used for this purpose but administered by an independent body like RubyCentral.</p>
<h1>In a Dream World: The Real Fix</h1>
<p>I spend way too much time thinking about cryptographic security models. In doing so, I always want to use the latest, greatest tools for the job. In that regard, RubyGems’ homebrew signature algorithm falls down. So do DSA and ECDSA, as they’re both vulnerable to entropy failure as was illustrated a few days ago when <a href="https://plus.google.com/u/0/106313804833283549032/posts/X1TvcxNhMWz" rel="nofollow">reused nonces were discovered signing BitCoin transactions</a>. The real fix would involve a modern signature algorithm like Ed25519, which would necessitate a tool beyond RubyGems itself (which, by design, relies only on the Ruby standard library).</p>
<p>As I mentioned earlier: I think project-specific certificates are the way to go. I don’t think it would be particularly hard to create a CA-like system that issues certificates to individual projects or people who maintain gems, but a certificate that signs the name of the gem, and is in turn signed by a trusted root.</p>
<p>If I were really to try to run a CA for RubyGems, I would probably try to run it at the project level, and try to have a nontrivial burden of proof that you are the actual owner of a project, including things like OAuth to your Github account and confirming your identity over multiple channels as I described earlier. I’d probably not rely on the existing RubyGems infrastructure at all but ship something separate that could ensure gems are loaded securely, using state-of-the-art cryptographic tools like RbNaCl, libsodium, and Ed25519.</p>
<p>I feel like the gems themselves should be the focus of authority, and the current RubyGems certificate system places way too much trust in individual users and does not provide a gem-centric authority system.</p>
<p>The whole goal of this is what I described earlier: a least authority system where the entire RubyGems ecosystem could be contaminated and RubyGems.org could be serving nothing but malware and the client-side tool could detect the tampering and prevent the gems from even being loaded.</p>
<p>If you’re interested in this sort of system, hit me up on Twitter (I’m @bascule).</p>
<h1>The Challenges Moving Forward</h1>
<p>So I’m probably not going to make the dream RubyGems CA system described above. What can we do in the meantime? I think someone needs to step up and create some kind of a CA system for gems, even if that someone is RubyGems.org themselves.</p>
<p>You can bypass the entire identity verification process and automatically issue signed certificates upon request. Doing so is ripe for abuse by active attackers, and brings up another important aspect of designing a certificate system like this: revocation.</p>
<p>Let’s say the CA has issued someone a certificate, and later discovered they were duped and they have just issued a certificate to Satan instead of Alice. Oops! The CA now needs to communicate the fact that what they said about Alice before, yeah whoops, that’s totally wrong, turns out Alice is Satan. My bad!</p>
<p>Whatever model is applied to secure gems must support certificate revocation. If whatever modicum of a trust model a certificate authority that auto-issues certificates to logged in users (or at best, an un unobtrusive volunteer-run CA) can provide is violated, which it surely will be, there should be some way to inform people authenticating certificates that there’s malicious certs out there that shouldn’t be trusted.</p>
<p>Whatever CA we were to trust needs to maintain a blacklist of revoked certificates, preferably one which can be authenticated with the CA’s public key. This is an essential component of any trust model with a central authority, and one, as far as I can tell, has not exactly been codified into the existing RubyGems software and its signature system.</p>
<h1>Now What?</h1>
<p>This is a problem I’d love to help solve. I don’t know the best solution. Perhaps we place all our trust in RubyGems, or perhaps we set up some other trust authority that maintains certificates. I think we can all agree whatever solution comes about should be fully open source and easily audited by anyone. <a href="http://en.wikipedia.org/wiki/Kerckhoffs&#39;s_principle" rel="nofollow">Kerckhoffs wouldn’t have it any other way</a>.</p>
<p>As the post title implies, I don’t have any definitive answers, but I think we need something to the tune of a CA to solve this problem, and for everyone to digitally sign their gems using certificates which in turn are signed by some central trust authority So Say We All (until said authority turns out to be malicious, at which point we abandon them for Better Trust Authority, and on and on ad infinitum)</p>
<p>Building a trust model around how we ship software in the Ruby world is a hard problem, but it’s not an unsolvable one, and I think any good solutions will still work even in the wake of a total compromise of RubyGems.org.</p>
<p>In the words of That Mitchell and Webb Look: come on boffins, let’s get this sorted!</p>
<p>(Edit: If you are interested in this idea, please take a look at <a href="https://github.com/rubygems-trust" rel="nofollow">https://github.com/rubygems-trust</a>. Also check out the #rubygems-trust IRC channel on Freenode)</p>
tag:tonyarcieri.com,2014:Post/dci-in-ruby-is-completely-broken2013-01-02T08:40:00-08:002013-01-02T08:40:00-08:00"DCI" in Ruby is completely broken<p>Rubyists are people who generally value elegance over performance. “CPU time is cheaper than developer time!” is a mantra Rubyists have repeated for years. Performance has almost always taken a second seat to producing beautiful code, to the point that Rubyists chose what used to be (<a href="http://www.unlimitednovelty.com/2012/06/ruby-is-faster-than-python-php-and-perl.html" rel="nofollow">but is no longer</a>) the slowest programming language on earth in order to get things done.</p>
<p>You can file me under the “somewhat agree” category, or otherwise I’d be using languages with a better performance track record like Java, Scala, or even Clojure. That said, I like Ruby, and for my purposes it has been fast enough. I also like to make light of those who would sacrifice elegance for speed, at least in Ruby, calling out those who would do silly stuff “FOR SPEED!” while compromising code clarity (otherwise known as roflscaling).</p>
<p>The past year though, there’s been an idea creeping through the Ruby community whose performance implications are so pathological I think it needs to die now. That idea is “DCI”, which stands for Data, Context, and Interaction. <a href="http://www.artima.com/articles/dci_vision.html" rel="nofollow">The DCI Architecture</a> paper makes the following claims:</p>
<blockquote class="large">
<p>Imagine that we might use something like delegation or mix-ins or Aspects. (In fact each of these approaches has at least minor problems and we’ll use something else instead, but the solution is nonetheless reminiscent of all of these existing techniques.)</p>
<p>…</p>
<p>In many ways DCI reflects a mix-in style strategy, though mix-ins themselves lack the dynamics that we find in Context semantics.</p>
</blockquote>
<p>DCI, as envisioned by its creators, is something distinct from mix-ins or delegation, however it seems in Ruby these are the two ways that people have chosen to implement it. This makes me believe that the way Rubyists are attempting to implement DCI does not live up to the original idea, but that’s a subject for a different blog post.</p>
<p>Far and away, the main pattern we see described as “DCI” in Ruby works by using a mix-in on an individual instance. This pattern is what I will be referring to from now on as “DCI”:</p>
<pre><code class="ruby">class ThisBlogContext
def initialize(rubyist)
rubyist.extend(Fool)
end
end
</code></pre>
<p>Let’s look at what’s happening here. First, we’re mixing the “Fool” module into the metaclass of the “rubyist” object. Since this is an instance-specific modification, unless “rubyist” already has an instance-specific metaclass, the Ruby VM needs to allocate a new one which will exist for the lifecycle of this object. Okay, so we’re allocating more memory than we otherwise would. That doesn’t seem too bad, does it?</p>
<p>Well, unfortunately there’s something far more sinister going on here inside the depths of any Ruby VM you happen to be using. The main thing we’re doing here is modifying the class hierarchy at runtime. Depending on when this is occurring, this can have very, very bad non-local effects. But before I get into that, let’s look at some benchmarks:</p>
<p><em>NOTE: you will need the benchmark-ips gem for this snippet</em></p>
<pre><code class="ruby">require 'rubygems'
require 'benchmark/ips'
class ExampleClass
def foo; 42; end
end
module ExampleMixin
def foo; 43; end
end
Benchmark.ips do |bm|
bm.report("without dci") { ExampleClass.new.foo }
bm.report("with dci") do
obj = ExampleClass.new
obj.extend(ExampleMixin)
obj.foo
end
end
</code></pre>
<p>And the results:</p>
<p><a href="http://img.svbtle.com/bascule_24425838726804_raw.png" rel="nofollow"><img src="https://d23f6h5jpj26xu.cloudfront.net/inline_bascule_24425838726804_raw.png" alt="DCI benchmark results"></a></p>
<p>Using DCI is about an order of magnitude slower (or in the case of Rubinius, <em>four</em> orders of magnitude slower) than simply instantiating an object. Okay, so DCI is slow, right? Big deal, plenty of things are slow. But should we really care? The actual bottleneck here is going to be talking to the database or something, right? Actually, something far more sinister is going on here…</p>
<p>What if I were to tell you that the performance impact you’re seeing here wasn’t just localized to the little snippet we’re microbenchmarking, but is in fact having non-local effects that are causing similar performance degradations throughout your Ruby application? Scared now?</p>
<p>This is exactly what’s happening. All Ruby VMs use method caches to improve dispatch speed. They can, for example, cache which method to use based on the types flowing through a particular call site. These caches remain valid so long as we don’t see new types and the class hierarchy doesn’t change.</p>
<p>Unfortunately, what this approach to DCI is doing by using an instance-specific mixin is making modifications to the class hierarchy at runtime, and by doing so, it’s busting method caches throughout your application. By busting these caches everywhere, the performance effects you see aren’t localized just to where you’re using DCI. You’re taking a pathological performance hit every time you use <code>obj.extend(Mixin)</code>.</p>
<p>This becomes especially problematic if you’re performing these sorts of runtime mixins every time you handle a request (or worse, multiple times per request). By doing so, you are preventing these caches from ever filling, and forcing the VM to dispatch methods in the most pathological way possible every time you use this feature.</p>
<p>Ruby gives you a lot of expressive power, but with great power comes great responsibility. My advice to you is to completely avoid using any of Ruby’s dynamic features which alter the class hierarchy after your application has loaded. They’re great to use when loading your application, but once your app has been loaded, you should really avoid doing anything that creates instance-specific metaclasses. This isn’t just limited to <code>extend</code> on objects but also includes things like doing <code>def</code> within a <code>def</code>, <code>def obj.method</code>, <code>class &lt;&lt; obj</code> then making modifications, or <code>define_method</code>.</p>
<p>What’s a better approach to doing something like DCI that doesn’t blow your method cache? Delegation. <a href="http://evan.tiggerpalace.com/articles/2011/11/24/dci-that-respects-the-method-cache/" rel="nofollow">Evan Light blogged over a year ago about DCI that respects the method cache</a> using SimpleDelegator. In addition to respecting your method cache, I personally also find this approach a lot cleaner.</p>
tag:tonyarcieri.com,2014:Post/2012-the-year-rubyists-learned-to-stop-worrying-and-love-the-threads2012-12-18T09:20:00-08:002012-12-18T09:20:00-08:002012: The Year Rubyists Learned to Stop Worrying and Love Threads (or: What Multithreaded Ruby Needs to Be Successful)<p>Let me provide a very different picture of how Rubyists used to view threads versus what the title of this post implies about now. I’m not talking about in 2005 in the early days of Rails. I’m talking about at Dr. Nic’s talk at RubyConf 2011, a little more than a year ago. Dr. Nic had a fairly simple message: when performance matters, build multithreaded programs on JRuby (also: stop using EventMachine). Now granted he was working the company that was subsidizing JRuby development at the time, but I didn’t, and I for one strongly agreed with him. Not many other people in the room did. The talk seemed to be met with a lot of incredulity.</p>
<p>“I thought this was going to be a talk on EventMachine!” said That Guy. Perhaps what That Guy missed was that Dr. Nic had hosted EventMachineConf as a subconference of RailsConf a few months before. And now Dr. Nic was saying don’t use EventMachine, use threads. And Dr. Nic is certainly <a href="https://raw.github.com/gist/e1744a804a6f7469b022/09db938de5063a7ff70d367fa608cd61c0e735c0/gistfile1" rel="nofollow">not the only one who has come to this conclusion</a>.</p>
<p>Flash forward to 2012 and I think the Ruby community has completely changed its tune. I may be a bit biased, but if I were to pick an overall “theme” (or perhaps “tone”) of RubyConf 2012, it’s that the single-core nature of the canonical Ruby interpreter, MRI (the “Matz Ruby Interpreter”), is limiting Ruby’s potential applications.</p>
<p>There was a lot of buzz about JRuby this year, and Brian Ford, one of the primary developers of Rubinius, announced the first 2.0 prerelease. Both of these Ruby implementations support parallel execution of multithreaded Ruby programs on multicore CPUs. I talked to a lot of people who are interested in my Celluloid concurrent object library as well.</p>
<p>At the same time RubyConf 2012 marked the first “Ruby 2.0” prerelease. Many of the talks covered upcoming Ruby 2.0 features, most notably refinements. Much like the release of Rails 2.0, this felt a bit underwhelming. What the crowd was clamoring for was what would be done with the Global Interpreter Lock (or GIL, or perhaps more appropriately the Global VM Lock or GVL in ruby-core parlance).</p>
<p>At the end of the conference, Evan Phoenix sat down with Matz and asked him various questions posed by the conference attendees. One of these questions was about the GIL and why such a substantial “two dot oh” style release didn’t try to do something more ambitious like removing the GIL and enabling multicore execution. Matz looked a bit flustered by it, and said “I’m not the threading guy”.</p>
<p>Well Matz, I’m a “threading guy” and I have some ideas ;)</p>
<p>Personally I’m a bit dubious about whether or not removing the GIL from MRI is a good idea. The main problems would be the initial performance overhead of moving to a fine-grained locking scheme, and also the bugs that would crop up as a large codebase originally intended for single-threaded execution is retrofitted into a multithreaded program. I think the large bug trail this would create would hamper future Ruby development, because instead of spending their time improving the language itself, ruby-core would spend its time hunting thread bugs.</p>
<p>All that said, there are some features I would personally like to see in Ruby which would substantially benefit multithreaded Ruby programs, GIL or no GIL. I would personally prioritize all of these features ahead of removing the GIL, as they would provide cleaner semantics we need to write correct multithreaded Ruby programs:</p>
<h2>Recommendation #1: Deep freeze</h2>
<p>Immutable state has a number of benefits for concurrent programs. What better way to prevent concurrent state mutation than to prevent <em>any</em> state mutation? (actually there are ways I think are better, but I’ll get to that later) Ruby provides <code>#freeze</code> to prevent modifications to an object, however I don’t think <code>#freeze</code> is enough.</p>
<p>Purely immutable languages allow the creation of immutable persistent data structures. The word “persistent” in this case doesn’t mean written to disk, but that you can create multiple versions of the same data structure with “persistent” copies of parts of the old one that get shared between versions. This approach only works if it’s immutable state all the way down. Ruby supports immutable persistent data structures via the <a href="https://github.com/harukizaemon/hamster" rel="nofollow">Hamster gem</a>, but it would be much easier to work with immutable data if this feature were in core Ruby.</p>
<p>What we need is more than just freezing of individual objects that <code>#freeze</code> provides. As <a href="http://developinthecloud.drdobbs.com/author.asp?section_id=2284&amp;doc_id=256017" rel="nofollow">some recent compiler research into immutability by Microsoft demonstrates</a>, what really matters isn’t the mutability of individual objects but rather <em>aggregates</em> (i.e. object graphs). We need a guaranteed way to freeze aggregates as a whole, rather than freezing just a single object at a time. This means we’d walk all references from a parent object and freeze every single object we find. This would allow for the creation of efficient immutable persistent data structures in Ruby.</p>
<p>What would this look like? Something like <code>Object#deep_freeze</code>. While it’s possible to use Ruby introspection to attempt to traverse all the references a given object is holding recursively (see the <a href="https://github.com/dkubb/ice_nine" rel="nofollow">ice_nine gem</a>) this is something I really feel should be part of the language proper. I also get the idea that VM implementers know exactly where these references are in their implementations and could implement a lot faster version of <code>#deep_freeze</code> than using Ruby reflection to spelunk objects and find their references.</p>
<h2>Recommendation #2: Deep dup</h2>
<p>There’s another approach that works equally well when we have objects we’d like to mutate but that we’d also like to share across threads. Before we send objects across threads, we could make a copy, and give the copy to another thread.</p>
<p>Making copies every time we want to pass an object to another thread might sound expensive and wasteful, but there’s a language that’s been very successful at multicore performance which does just that: Erlang. In Erlang, every process has its own heap, so every time a message is passed from one process to another the Erlang VM makes a copy of the data being sent in the message and places the new copy in the receiving process’s heap space. (the exception is binary data, for which Erlang has a shared heap)</p>
<p>Ruby has two methods for making shallow copies of objects, <code>Object#dup</code> and <code>#clone</code> (which are more or less synonymous except <code>#clone</code> copies the frozen state of an object). However, the only built-in way to make a deep copy of entire object graphs is to use <code>Marshal</code>. This is nowhere near ideal for making in-VM copies, because for starters it produces an intermediate string that needs to be garbage collected, not to mention that Marshal uses a complex protocol which precludes the sorts of optimizations that could be done on a simple deep copy operation.</p>
<p>Instead of Marshaling, Ruby could support <code>Object#deep_dup</code> to make deep copies of entire object graphs. This would work much like <code>#deep_freeze</code>, traversing all references in an object graph but constructing an equivalent copy instead of freezing every object. Once a copy has been created, it can be safely sent to another thread. This could be leveraged by systems like Celluloid which control what happens at the boundary between threads. If Celluloid even provided an optional mode for always copying objects sent in messages, then using it would ensure your program was safe of concurrent mutation bugs.</p>
<h2>Bonus points #1: Ownership Transfer</h2>
<p>Copying object graphs every time we pass a reference to another thread is one solution to providing both mutability and thread safety, however making copies of object graphs is a lot slower than a zero copy system. Can we have our cake and eat it too: zero-copy mutable state objects that are free of any potential concurrent mutation bugs?</p>
<p>There’s a great solution to this: we can pin whole object graphs to a single thread at a time, raising exceptions in other threads that may hold a reference to any object in the graph but do not own it and attempt to perform any type of access. This idea is called ownership transfer.</p>
<p>The <a href="http://www.malhar.net/sriram/kilim/" rel="nofollow">Kilim Isolation-Typed Actor system</a> for Java is one implementation of this idea. Kilim supports the idea of “linear ownership transfer”: only one actor can ever see any particular object graph in the system, and object graphs can be transferred wholesale to other actors, but cannot be shared. For more information on the messaging model in Kilim, I definitely suggest you check out <a href="http://www.youtube.com/watch?v=37NaHRE0Sqw#t=10m4s" rel="nofollow">the portion of Kilim-creator Sriram Srinivasan’s talk</a> on the isolation system Kilim uses for its messages.</p>
<p>Another language that supports this approach to ownership transfer is Go. References passed across channels between goroutines change ownership. For more information on how this works in Go, I recommend checking out <a href="http://golang.org/doc/codewalk/sharemem/" rel="nofollow">Share Memory By Communicating</a> from the Go documentation. (Edit: I have been informed that Go doesn’t have a real ownership transfer system and that the idea of ownership is more of a metaphor, which means the safety guarantees around concurrent mutation are as nonexistent as they are in Ruby/Celluloid)</p>
<p>Ruby could support a similar system with only a handful of methods. We could imagine <code>Object#isolate</code>. Like the other methods I’ve described in this post, this method would need to do a deep traversal of all references, isolating them as well so as to isolate the entire object graph.</p>
<p>Moreover, to be truly effective, isolation would have to apply to any object that an isolated object came in contact with. If we add an object to an isolated array, the object we added would also need to be isolated to be safe. This would also have to apply to any objects referenced from the object we’re adding to the isolated aggregate. Isolation would have to spread like a virus from object-to-object, or otherwise we’d have leaky bits of our isolated aggregate which could be concurrently accessed or mutated without errors.</p>
<p>If a reference to an isolated object were to ever leak out to another thread, and that thread tried to reference it in any way, the system would raise an <code>OwnershipError</code> informing you that an unpermitted cross-thread object access was performed. This would prevent any concurrent access or mutation errors by simply making any cross-thread access to objects without an explicit transfer of ownership an error.</p>
<p>To pass ownership to another thread, we could use a method like <code>Thread#transfer_ownership(obj)</code> which would raise <code>OwnershipError</code> unless we owned the object graph rooted in <code>obj</code>. Otherwise, we’ve just given control of the object graph to another thread, and any subsequent accesses by ourselves will result in <code>OwnershipError</code>. If we ever want to get it back again, we will have to hand the reference off to that other thread, and the other thread must explicitly transfer control of the object graph back to us.</p>
<p>A system like this would be a dream come true for Celluloid. One of the biggest drawbacks of Celluloid is its inability to isolate the objects being sent in messages, and while either <code>#deep_freeze</code> or <code>#deep_dup</code> would provide solutions to the isolation problem (with various and somewhat onerous tradeoffs), an ownership transfer system could provide effective isolation, zero copy messaging, and preserve mutability of data (which Ruby users will generally expect).</p>
<h2>Bonus points #2: Unified Memory Model and Concurrent Data Structures</h2>
<p>Java in particular is well known for its <code>java.util.concurrent</code> library of thread-safe data structures. Many of these are lock-free equivalents of data structures we’re already familiar with (e.g. <code>ConcurrentHashMap</code>). Others provide things like fast queues between threads (e.g. <code>ArrayBlockingQueue</code>)</p>
<p>It would be great if Ruby had a similar library of such data structures, but right now Ruby does not have a defined memory model (yet another thing Brian Ford called for), and without a memory model shared by all Ruby implementations it seems difficult to define how things like concurrent data structures will behave.</p>
<h1>Conclusion</h1>
<p><a href="https://twitter.com/smarr" rel="nofollow">Stefan Marr</a> recently <a href="https://www.youtube.com/watch?v=eHPp-tpCAZ0" rel="nofollow">gave an awesome talk laying out the challenges for building multicore programs in object-oriented languages</a> which contains a number of points relevant to systems like Celluloid.</p>
<p>Celluloid solves some of the synchronization problems of multithreaded programs, but not all of them. It’s still possible to share objects sent in messages between Celluloid actors, and it’s possible for concurrent mutations in these objects go unnoticed.</p>
<p>I don’t think I can solve these problems effectively without VM-level support in the form of the aforementioned proposed features to core Ruby. You can imagine being able to do <code>include Celluloid::Freeze</code> or <code>include Celluloid::Dup</code> to control the behavior of how individual actors pass messages between each other. Or, even better, if Ruby had an ownership transfer system Celluloid could automatically transfer ownership of objects passed as operands to another actor or returned from a synchronous call. If that were the case, accesses to objects which have been sent to other threads would result in an exception instead of a (typically) silent concurrent mutation.</p>
<p>Celluloid is your best bet for building complex multithreaded Ruby programs, but it could be better… and we need Ruby’s help. </p>
tag:tonyarcieri.com,2014:Post/all-the-crypto-code-youve-ever-written-is-probably-broken2012-11-12T21:56:00-08:002012-11-12T21:56:00-08:00All the crypto code you've ever written is probably broken<p>tl;dr: use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. </p>
<p><a href="https://plus.google.com/108313527900507320366/posts/cMng6kChAAW" rel="nofollow">Do you keep up on the latest proceedings of the IACR CRYPTO conference</a>? No? Then chances are whenever you have tried to use a cryptographic library you made some sort of catastrophic mistake which would lead to a complete loss of confidentiality of the data you’re trying to keep secret.</p>
<p>The most important question is: are you using an authenticated encryption mode? If you don’t know what authenticated encryption is, then you’ve probably already made a mistake. Here’s a hint: authenticated encryption has nothing to do with authenticating users into a webapp. It has everything to do with ensuring the <em>integrity</em> of your data hasn’t been compromised, i.e. no one has tampered with the message.</p>
<p>Why is authenticated encryption so poorly known despite being so important? I don’t know. Perhaps it’s because the need for it wasn’t <a href="http://cseweb.ucsd.edu/%7Emihir/papers/oem.html" rel="nofollow">formally proven until the year 2000</a>. And chances are you’ve never heard of authenticated encryption at all, because despite the best efforts of the cryptographic community it remains a relatively poorly-known concept.</p>
<p>Most of the cryptographic APIs you’ve ever encountered have probably made you run a gambit of choices for how you want to encrypt data. You might think AES-256 is the way to go, but by default your crypto API might select ECB mode, which is so bad and terribly insecure it isn’t even worth talking about. Perhaps you select CBC or CTR mode, but your crypto API doesn’t make you specify a random IV and will always encrypt everything with an IV of all zeroes, which if you ever reuse the same key will compromise the confidentiality of your data.</p>
<p>Let’s say you’ve gotten through all of that and are now using something like AES-CTR mode with a random IV per message. Great. Do you think you’re secure now? Probably not. A sophisticated attacker might attempt a man-in-the-middle attack, which gives him the ability to execute “chosen ciphertext” attacks (CCAs). To defend against these you must also ensure the <em>integrity</em> of your data, or otherwise confidentiality might be lost.</p>
<p>You may have learned you need to use a <a href="http://en.wikipedia.org/wiki/Message_authentication_code" rel="nofollow">MAC</a> to do this (and if you didn’t you’re most likely insecure!). You may have selected <a href="http://en.wikipedia.org/wiki/Hash-based_message_authentication_code" rel="nofollow">HMAC</a> for this purpose. But you’re still left with three options here! Do you compute the MAC of the plaintext or the ciphertext. If you compute the MAC of the plaintext, do you encrypt it along with the plaintext, or do you append it to the end of the ciphertext? Or to spell it out more precisely, which of the following do you do?</p>
<ul>
<li>
<strong>Encrypt and MAC</strong>: encrypt the plaintext, compute the MAC of the plaintext, and append the MAC of the pltaintext to the ciphertext</li>
<li>
<strong>Encrypt then MAC</strong>: encrypt the plaintext, compute the MAC of the ciphertext, and append the MAC of the ciphertext to the ciphertext</li>
<li>
<strong>MAC then Encrypt</strong>: MAC the plaintext, append the MAC to the plaintext, then encrypt the plaintext and the MAC</li>
</ul>
<p>If you have answered any of the above questions incorrectly (the correct answer to the above question is “encrypt then MAC”) you’ve quite likely created an insecure cryptographic scheme. Unless you really know what you’re doing and can answer all these questions correctly (and even then!), you probably shouldn’t be trying to build your own cipher/MAC constructions and should defer to cryptographic experts who specialize in that sort of thing. These cipher/MAC constructions are called authenticated encryption modes.</p>
<p>If you find yourself reaching for any form of encryption that isn’t an authenticated encryption mode, you’re probably doing it wrong. You shouldn’t ever be choosing between CBC or CFB or CTR (or god forbid ECB). Unless you’re a cryptographer, these should be considered dangerous low-level primitives not for the consumption of mere mortals.</p>
<p>That said, what should you be using?</p>
<ul>
<li><p><strong><a href="http://en.wikipedia.org/wiki/AEAD_block_cipher_modes_of_operation" rel="nofollow">NIST-approved AEAD block ciphers</a></strong>: AEAD stands for Authenticated Encryption with Associated Data, and represent ciphers that simultaneously provide confidentiality and integrity of data. Examples of these ciphers include <strong><a href="http://en.wikipedia.org/wiki/EAX_mode" rel="nofollow">EAX</a></strong>, <strong>GCM</strong>, and <strong>CCM</strong> modes. (Edit: some cryptographers have suggested I probably shouldn’t even recommend using this directly as there are still a number of attacks you will probably be susceptible unless you know what you’re doing, especially if you have a service accessible to an active attacker)</p></li>
<li><p><strong>djb’s authenticated encryption modes in NaCl</strong>: there are two authenticated encryption modes available in the <a href="http://nacl.cr.yp.to/" rel="nofollow">Networking and Cryptography library</a> by <a href="https://twitter.com/hashbreaker" rel="nofollow">Daniel J. Bernstein</a>: <strong><a href="http://nacl.cr.yp.to/secretbox.html" rel="nofollow">crypto_secretbox</a></strong> and <strong><a href="http://nacl.cr.yp.to/box.html" rel="nofollow">crypto_box</a></strong>, which respectively provide symmetric and pubkey modes of encryption and integrity checking.</p></li>
<li><p>(Edit: adding this retroactively) <strong><a href="http://www.keyczar.org/" rel="nofollow">Google Keyczar</a></strong> also provides a high-level cryptographic toolkit with authenticated encryption modes.</p></li>
<li><p>(Edit: also adding this retroactively) <strong>GPG</strong> is one of the easiest cryptographic tools to use that provides high-level functionality intended for cases where you would like authenticated encryption.</p></li>
</ul>
<p>EAX is one of the recommended modes and is relatively easy to understand: it’s a combination of AES-CTR mode and <a href="http://www.nuee.nagoya-u.ac.jp/labs/tiwata/omac/omac.html" rel="nofollow">CMAC (a.k.a. OMAC1)</a> which is a MAC derived from a block cipher (in this case AES). While EAX mode is relatively simple to understand and you may be tempted to implement it yourself it if it’s unavailable in your language environment, you probably shouldn’t, as there are a number of potential pitfalls that await you and unless you know what you’re doing (and even then!) you’re likely to get it wrong.</p>
<p>If I’ve scared you enough by now, you my be googling around to discover if there’s an implementation of any of the above modes in your respective programming language environment, and sadly in many language environments you may turn up empty. In these cases, there’s not much you can do except petition your language maintainers who specialize in cryptography to expose APIs to authenticated encryption modes.</p>
<p>Authenticated encryption is something you should use as a complete package, implemented as a single unit by a well-reputed open source cryptographic library and not assembled piecemeal by people who do not specialize in cryptography.</p>
<p>Bottom line: unless you’re using authenticated encryption, you are opening yourself up to all sorts of attacks you can’t even anticipate, and shouldn’t consider the data you’re storing confidential.</p>
<p><em>Edit: several people have asked about more information on everything I’ve described here, most notably why various MACing schemes are secure or insecure. If you are really interested in this topic, I strongly recommend you take the <a href="http://crypto-class.org" rel="nofollow">Stanford Crypto class on Coursera</a> which is what inspired me to write this blog post to begin with.</em></p>