Main menu

Deterministic Builds Part One: Cyberwar and Global Compromise

I've spent the past few months developing a new build system for the 3.0 series of the Tor Browser Bundle that produces what are called "deterministic builds" -- packages which are byte-for-byte identical no matter who actually builds them, or what hardware they use. This effort was extraordinarily involved, consuming all of my development time for over two months (including several nights and weekends), babysitting builds and fixing differences and issues that arose.

When describing my recent efforts to others, by far the two most common questions I've heard are "Why did you do that?" and "How did you do that?". I've decided to answer each question at length in a separate blog post. This blog post attempts to answer the first question: "Why would anyone want a deterministic build process?"

The short answer is: to protect against targeted attacks. Current popular software development practices simply cannot survive targeted attacks of the scale and scope that we are seeing today. In fact, I believe we're just about to witness the first examples of large scale "watering hole" attacks. This would be malware that attacks the software development and build processes themselves to distribute copies of itself to tens or even hundreds of millions of machines in a single, officially signed, instantaneous update. Deterministic, distributed builds are perhaps the only way we can reliably prevent these types of targeted attacks in the face of the endless stockpiling of weaponized exploits and other "cyberweapons".

The Dangerous Pursuit of "Cyberweapons" and "Cyberwar"

For the past several years, we've been seeing a steady increase in the weaponization, stockpiling, and the use of software exploits by multiple governments, and by multiple agencies of multiple governments. It would seem that no networked computer is safe from a discovered but undisclosed and already weaponized vulnerability against one or more of its software components -- with each vulnerability being resold an unknown number of times.

Worse still, with Stuxnet and Flame, this stockpile has grown to include weaponized exploits specifically designed to "bridge the air gap" against even non-networked computers. Examples include exploits against software/hardware USB stacks, filesystem drivers, hard drive firmware, and even disconnected Bluetooth and Wifi interfaces. Even if these exploits themselves don't leak, the fact that they are known to exist (and are known to be deliberately kept secret from manufacturers and developers) means that other parties can begin looking for (or simply re-purchasing) the underlying vulnerabilities themselves, without fear of their disclosure or mitigation.

Unfortunately, the use of such exploits isn't limited to attacks against questionable nuclear energy programs by hostile states. The clock is certainly ticking on how long it will be before multiple other intelligence agencies, along with elements of organized crime and "terrorist" groups, have replicated these weapons.

We are essentially risking all of computing (or at least major sectors of the world economy that are dependent on specific software systems) by stockpiling these weapons, as if there would be any possibility of retaliation after a serious cyberattack. Wakeup call: There is not. In fact, the more exploits exist, the higher the risk of the wrong one leaking -- and it really only takes a chain of just a few of the right exploits for this to happen.

Software Engineering Complexity: The Doomsday Scenario

The core problem is this: With the number of dependencies present in large software projects, there is no way any amount of global surveillance, network censorship, machine isolation, or firewalling can sufficiently protect the software development process of widely deployed software projects in order to prevent scenarios where malware sneaks into a development dependency through an exploit in combination with code injection, and makes its way into the build process of software that is critical to the function of the world economy.

Such malware could be quite simple: One day, a timer goes off, and any computer running the infected software turns into a brick. In fact, it's not that hard to destroy a computer via software. Linux distributions have been accidentally tripping on bugs that do it for two decades now. If the right software vector is chosen (for example, a popular piece of software with a rapid release cycle and an auto-updater), a logic bomb that infects the build systems could continuously update the timestamps in the distributed versions of itself to ensure that the infected computers are only destroyed in the event that the attacker actually loses control of the software build infrastructure. If the right systems are chosen, this destruction could mean the disruption of all industrial control or supply chain systems simultaneously, disabling the ability to provide food, water, power, and aid to hundreds of millions of people in a very short amount of time.

The malware could also be more elaborate, especially if the motives are financial as opposed to purely destructive. The ability to universally deploy a backdoor that would allow modification of various aspects of financial transaction processing, stock markets, insurance records, and the supply chain records of various industries would prove tremendously profitable in the right circumstances. Just about all aspects of business are computerized now, and if the computer systems say an event did or didn't happen, that is the reality. Even short of modification, early access to information about certain events is also valuable -- unreleased earnings data from publicly traded companies being the immediate example.

In this brave new world, without the benefit of anonymity and decentralization to protect single points of failure in the software engineering process from such targeted attacks, I don't believe it is possible to keep software signing keys secure any more, nor do I believe it is possible to keep even an offline build machine secure from malware injection any more, especially against the types of adversaries that Tor has to contend with.

As someone who regularly discusses software engineering practices with the best and the brightest minds in the computer industry, I can tell you with certainty that even companies that exercise current best practices -- such as keeping their software build machines offline (and even these companies are few and far between) can still end up being infected, due to the existence and proliferation of the air gap bridging exploits mentioned above.

A true air gap is also difficult to achieve even if it could be used to ensure build machine integrity. For example, all of the major Windows web browser vendors employ a Microsoft run-time optimization technique called "Profile Guided Optimization". This technique requires running an initial compiled binary on a machine to produce a profile output that represents which code paths were executed and which were most expensive. This output is used to transform its code and optimize it further. In the case of browsers, this means that an untrusted, proprietary, and opaque input is derived from non-deterministic network sources (such as the Alexa Top 1000) and transferred to the build machines, to produce executable code that is manipulated and rewritten based on this network-derived, untrusted input, and upon the performance and other characteristics of the specific machine that was used to generate this profile output.

This means that software development has no choice but to evolve beyond the simple models of "Trust our RSA-signed update feed produced from our trusted build machines", or even companies like Google, Mozilla, Apple, and Microsoft are going to end up distributing state-sponsored malware in short order.

Deterministic Builds: Integrity through Decentralization

This is where the "why" of deterministic builds finally comes in: in our case, any individual can use our anonymity network to privately download our source code, verify it against public signed, audited, and mirrored git repositories, and reproduce our builds exactly, without being subject to such targeted attacks. If they notice any differences, they can alert the public builders/signers, hopefully using a pseudonym or our anonymous trac account.

This also will eventually allow us to create a number of auxiliary authentication mechanisms for our packages, beyond just trusting a single offline build machine and a single cryptographic key's integrity. Interesting examples include providing multiple independent cryptographic signatures for packages, listing the package hashes in the Tor consensus, and encoding the package hashes in the Bitcoin blockchain.

I believe it is important for Tor to set an example on this point, and I hope that the Linux distributions will follow in making deterministic packaging the norm. Thankfully, due to our close relationship with Debian, after we whispered in a few of the right ears they have started work on this effort. Don't despair guys: it won't take two months for each Linux package. In our case, we had to cross-compile Firefox deterministically for four different target operating system and architecture combinations and fix a number of Firefox-specific build issues, all of which I will describe in the second post on the technical details.

I think what you're saying is that your code won't use Profile Guided Optimization. What sort of performance hit does that mean you'll be taking?

Will you consider upgrading to a more secure base browser that implements process isolation and other modern security features? You don't need to bother owning the build system if you can just exploit the client in the wild.

Owning the build system is a break-once break-everywhere attack that enables the delivery of malware to an entire userbase in a single autoupdate.

On the other hand, exploits are sensitive to the target product's version, OS type and version, processor bitwidth, additional OS-level hardening and sandboxing, and several other factors that may not be visible at the time of exploit delivery.

As for switching browsers, "It is not best to swap horses while crossing the river."

Google knows exactly what they need to do to support a Tor mode that matches the security and privacy standards we've set with Tor Browser. However, organizational schizophrenia prevents even the most basic proxy bypass vulnerabilities from being addressed. It makes no sense to try to use a "more secure" architecture if it can be trivially induced to bypass Tor, or otherwise track you.

Mozilla, on the other hand, is moving in the direction of sandboxing the content window using 'Electrolysis' to provide process-level least privilege isolation. It may be about a year or more before that materializes, but they do want it to happen.

You might want to look at my 2009 PhD dissertation "Fully Countering Trusting Trust through Diverse Double-Compiling" - available at http://www.dwheeler.com/trusting-trust (and other places; the dissertation is released under CC-BY-SA 3.0, GNU FDL, or GPLv2+). In the dissertation I described how to use a deterministic build process of a compiler to counter the "trusting trust" attack. I demonstrated it with gcc and with a maliciously-subverted Lisp compiler.

Dr. Wheeler (if that is your real name ;) - Yes I am aware of your work, and it is awesome. As an approximation/first step, I've been toying with the idea of compiling our chosen compilers' sources on multiple Linux distributions, and still trying to produce identical resulting binaries. This should mitigate Trusting Trust attacks that haven't managed to infect all of the Linux distributions yet.

The thing that concerns me is that even if we do the full DDC with an assembly-audited TinyCC, we still have to worry about the integrity of other packages on the build distribution image. To fully defend against Trusting Trust in scenarios where the malware is capable of jumping in and out of the compiler, it seems we need to use DDC on multiple systems (perhaps involving several cross-compiling architectures) to produce a compiler, kernel, and minimal base OS image that can be used to bootstrap a safe build environment.

Not an impossible task, but certainly a large one. If you're aware of any good interim steps or other ways to mitigate the threat from OS packages, I'd love to hear them.

"DDC on multiple systems (perhaps involving several cross-compiling architectures) to produce a compiler, kernel, and minimal base OS image that can be used to bootstrap a safe build environment," sounds to me like the minimal first effort toward anything to which the word 'secure' could even be laughingly applied.

Anything less is a joke, and that is why we are in the ridiculous world we live in today, where half the news headlines reflect leaks on computerised spying of half the world's population, but the responsible agency can't itself name what may have been leaked, because they lost track.

I'm not sure why an infected compiler source wouldn't detect that it was compiling itself and ensure that it generated 'correct' code while doing so but 'corrupted' code when compiling other programs. Perhaps this is addressed in your thesis which I am currently reading...

Currently packages are hashed by their "build instructions" and source code that will span through all dependencies, but once we have working deterministic binary outputs, that can be expanded to that.

Cyber warfare involves the actions by a nation-state or international organization to attack and attempt to damage another nation's computers or information networks through, for example, computer viruses or denial-of-service attacks. RAND research provides recommendations to military and civilian decisionmakers on methods of defending against the damaging effects of cyber warfare on a nation's digital infrastructure.
To reduce one have to understand about the hardware and networking of a computer......
more info @ http://ipsol.in/

So, talking about the Perpetual Global War and compromise, one could take another look at the Guardian article I linked to. There are naturally other kinds of compromises which some black hats might wish to use either online or in real life.

There are organisations (NSA, British GCHQ, China, Iran, Soviet Union--old habits die hard), that might try to do stuff like break SSL, some sofware vendors' systems to practically have a backdoor into software development, to include it in each new compromised build, as the OP explained: http://bit.ly/13pVygx (Guardian's Snowden Special from Thu, Sep 4th)

The obviously crazy thing is that sooner rather than later some black hat other than already mentioned, is going to find that backdoor. It's in unknown number of implementations (and builds), and the compromise code is in unknown versions of SSL (they don't mention TLS) implementations, eg browser https implementation.

So as some privacy advocates (eg EFF, FSF) warned back in late 2001, that by compromising Web freedom and privacy, along other privacy issues, one is selling one's birthright for a "mess of pottage" (bowl of stew, if I understand it).

So now, much of the "secure" Web is compromised, many commercial encryption software implementations include that NSA backdoor, that we actually knew all the time was there.

Still, for some reason the "Patriot" Act was rammed through Congress without any real debate or even the Intelligence Committees of both houses understanding what they had voted for. Then the broadest possible interpretations of the Act have permitted total surveillance, up to encrypted communication (there are stronger encryption systems without the backdoor and/or other compromise), which the population has trusted to handle their Web purchases, bank transactions, taxes & investment accounts. At the moment, there is no knowing which certificates contain compromised encryption keys.

In other words, it's a clusterfuck. And still, they couldn't tell that the Tsarnaev big brother was surfing extremist web sites, and writing hateful comments, plus behaving in Russia in such a way, that FSB gave two separate warnings to the FBI.

As so many have said, the surveillance is directed towards citizens, to protect governments from protests. Now, that feels like freedom of assembly and expression!

The Guardian reported that the NSA has cracked SSL- online banking etc. They worked with NIST to screw encryption standards, and they have super-computers(quantum?) when that doesn't work.

I think the main vulnerabilities with TOR is that 1) it's run by volunteers, rather than like BitTorrent, being run by everyone using the service, (and sharing bandwidth in a P2P fashion); and 2) to really be secure, we need the most secure encryption that we can have where there aren't passwords that can be stolen, or brute forced, or algorithms that their computers can crack etc.

While I think this deterministic thing is a great effort, I think those are the more pressing issues. But thanks for at least placing a priority on security. (I consider privacy a euphemism for certain aspects of personal security.)

How does a deterministic build affect the ability to "lock down"? What I mean by that is: what if someday a nation state decided that a particular executable wasn't to be allowed under any circumstance. So China one day decides that Tor is really, REALLY not allowed for some period of time. They implement their own weaponized exploit on hundreds of millions of their own PCs. The payload is to prevent an executable with a particular hash from operating. Would deterministic builds simplify this?

No. You can already go to torproject.org and get the hashes of the Tor binaries. Each release of Tor has the same hash on every computer already.

The point of deterministic compiling is to be able to reproduce the build of Tor made by The Tor Project on your own PC, so that you can verify the binary wasn't compromised through the build system. Of course The Tor Project can do the same thing on several of their own computers, for the same effect. Think of it as similar to reproducing a scientific experiment. The more different environments produce the same result, the more we trust the result.

(Also the obvious problem with your scenario is that if the attacker can do this, he already has control so it doesn't really matter.)

Just a question... Wouldn't the solution for deterministic builds for Tor, have to also be implemented for every other package in a distro? Scratching my head as if I am missing something, but it seems distros are vulnerable to weakest link in the chain theory. If they include updates of software that could be/are compromised because they do not conform to the new as-yet not fully implemented process.

If that's true- you need "buy in" of every other package maintainer or else malicious instructions could just re-appear upon the next auto-update. You would almost need to set back update schedules to compensate for longer release candidate and LTS integrity testing?

I love linux, and I have been using it since 1999- but I am not a programmer, so I think I am missing something. I am very concerned that the US or any government wants to make sure every OS is exploitable.

Back to basics then. The only logical conclusion of all of this is ...

- to make Tor and related packages and bundles no longer available for closed and partially undocumented proprietary operating systems that simply cannot be trusted with anonymous communications. This leaves GNU/Linux and BSD as the sole platforms for Tor.

- sane defaults! I can't remember how often I have seen people complaining in this blog about the globally enabled scripts in TBB. And yet, Tor developers have repeatedly insisted on keeping the default so not to inconvenience lazy or incompetent Tor users. This poses the question, by which criteria do we measure the success of a non-commercial project like Tor? Is convenience and popularity really what Tor should be aiming for? Doesn't popularity ultimately always corrupt core values and lead to the loss of a distinct profile?

Never going to happen in the real world, Anonymous. It's just is not. Too many people use Windows and OSX and too many people find Linux extremely extremely extremely hard to use, even super-techies like myself.

I would rather shoot off my head with a grenade launcher than use Linux of ANY variety, because it is too command-line only, necessitating that you remember obscure commands.

noob:
Serious question:: But is not all of this already "broken" as your ISP records all addresses ( you are still using TCPIP to transport packets over the I-net, and tcpip maintains origin/destination/routing info ) and as such any connection to another "tor" host would automatically be recorded direct from your connection, the same goes for the exit point which is after all just another connection through a a different ISP. there is no such thing as host-2-host connection, you have ISP's in the middle.

Of course your ISP knows that you are using Tor, and the exit node's ISP knows that they are connecting to the site. The point is that nobody knows that you are connecting to the site. Your traffic gets lost somewhere within the Tor network.

On September 1st, 2013 Anonymous said: Your final destination is encrypted. It's only decrypted by the exit node, which is not observable by your ISP.
On September 1st, 2013 Anonymous said:Of course your ISP knows that you are using Tor, and the exit node's ISP knows that they are connecting to the site. The point is that nobody knows that you are connecting to the site. Your traffic gets lost somewhere within the Tor network. Look up "how Tor works."

noob:
So your using VPN style packet encapsulation/encryption and doing packet-forwarding/routing between intermediate nodes. This still does not stop multiple ISP tracking. Your data-delivery-packet is wrapped up inside a public-packet which is delivered via public tcpip addresses. You cannot get away from using public addresses to move the encapsulated packet across the I-Net. The process works in both directions, after all your using tor to "view" the remote site, there is a "stream" of packets between the person initiating and the remote server. The exit points are wide open... 2 line sample from my server log.
[ipaddress removed... "to protect the tor exit point machine ;-) ]

Remember the History Channel documentary on Echelon... from the moment the data packet/stream leaves your machnie it is sent(recorded), redirected(recorded), delivered(recorded). ISP's need to make money from your traffic or else they go out of business, alphabet-soup have... other responsibilities.

Friends, I have been unable to use TOR connection for the last 4 days properly. All of a sudden if I have good luck, connection establishes and then loses after opening a site or two. Some say that it is on account of piratebrowser users. Do something in this regard as I have to send some documents to our member activists.