5 February 2015

Another day,
another data breach,
and another round of calls for companies to encrypt their databases.
Cryptography is a powerful tool, but in cases like this one it's not going
to help. If your OS is secure, you don't need the crypto; if it's not,
the crypto won't protect your data.

In a case like the Anthem breach, the really sensitive databases are
always in use. This means that they're effectively decrypted: the
database management systems (DBMS) are operating on cleartext, which means that
the decryption key is present in RAM somewhere. It may be in the OS, it
may be in the DBMS, or it may even be in the application itself (though
that's less likely if a large relational database is in use, which it
probably is). What's to stop an attacker from obtaining that key, or
perhaps from just making database queries?

The answer, in theory, is other forms of access control. Perhaps the
DBMS requires authentication, or operating system permissions will prevent
the attacker from getting at the keys. Unfortunately—and as these
many databreaches show—these defenses are not configured properly
or aren't doing the job. If that's the case, though, adding encryption
isn't going to help; the attacker will just go around the crypto.
There's a very simple rule of thumb here:
Encryption is most useful when OS protections cannot work.

What do I mean by that? The most obvious situation is where the
attacker has physical access to the device. Laptop disks should always
be encrypted; ditto flash drives, backup media, etc. Using full disk
encryption on your servers' drives isn't a bad idea, since it protects
your data when you discard the media, but you then have to worry about
where the key comes from if the server crashes and reboots.

Cloud storage is a good place for encryption, since you don't control
the machine room and you don't control the hypervisor. Again, your own
operating system isn't blocking a line of attack. (Note: I'm not saying
that the cloud is a bad idea; if nothing else, most cloud sysadmins are
better at securing their systems than are folks at average small companies.)
Email is another good use for encryption, unless you control your own
mail servers. Why? Because the data is yours, but you're storing it
on someone else's computer.

Encryption is a useful tool (and a fun research area), but like all
tools it's only useful if properly employed. If used in inappropriate
situations, it won't provide protection and will create operational
headaches and perhaps data loss from mismanaged keys.

Protecting large databases like Anthem's is a challenge. We need better
software security, and we need better structural tools to isolate the
really sensitive data from average, poorly protected machines. There may
even be a role for encryption, but simply encrypting the social
security numbers isn't going to do much.

16 February 2015

My Twitter feed has exploded with the
release of the Kaspersky report on the
"Equation
Group",
an entity behind a very advanced family of malware.
(Naturally, everyone is blaming the NSA. I don't know who wrote
that code, so I'll
just say it was beings from the Andromeda galaxy.)

The Equation Group has used a variety of advanced techniques, including
injecting malware into disk drive firmware, planting attack code on
"photo" CDs sent to conference attendees, encrypting payloads using details
specific to particular target machines as the keys
(which in turn implies prior knowledge of
these machines' configurations), and more. There are all sorts of
implications of this report, including the policy question of whether
or not the Andromedans should have risked their commercial market
by doing such things.
For now, though, I want to discuss one particular, deep technical question:
what should a conceptual security architecture look like?

For more than 50 years, all computer security has been based on the
separation between the trusted portion and the untrusted portion of
the system. Once it was "kernel" (or "supervisor") versus "user" mode,
on a single computer. The
Orange Book
recognized that the concept had to be broader, since there were all
sorts of files executed or relied on by privileged portions of the
system. Their newer, larger category was dubbed the "Trusted Computing
Base" (TCB). When networking came along, we adopted firewalls; the
TCB still existed on single computers, but we trusted "inside" computers
and networks more than external ones.

There was a danger sign there, though few people recognized it: our
networked systems depended on other systems for critical files. In a
workstation environment, for example, the file server was crucial, but
it as an entity wasn't seen as part of the TCB. It should have been.
(I used to refer to our network of Sun workstations as a single
multiprocessor with a long, thin, yellow backplane—and if you're
old enough to know what a backplane was back then, you're old enough to
know why I said "yellow"…) The 1988 Internet Worm spread with very
little use of privileged code; it was primarily a user-level
phenomenon. The concept of the TCB didn't seem particularly relevant.
(Should sendmail have been considered as part of the
TCB? It ran as root, so technically it was, but very little of it
actually needed root privileges. That it had privileges was more a sign of
poor modularization than of an inherent need for a mailer to be
fully trusted.)

The National Academies report
Trust in
Cyberspace recognized that the old TCB concept no longer made sense.
(Disclaimer: I was on the committee.)
Too many threats, such as Word macro viruses, lived purely at user level.
Obviously, one could have arbitrarily
classified word processors, spreadsheets, etc., as part of the
TCB, but that would have been worse than useless; these things were
too large and had no need for privileges.

In the 15+ years since then, no satisfactory replacement for the TCB
model has been proposed. In retrospect, the concept was not very
satisfactory even when the Orange Book was new. The compiler, for example,
had to be trusted,
even though it was too huge to be trustworthy. (The manual page for
gcc is itself about 90,000 words, almost as long as a short novel—and
that's just the man page; the code base is far larger.)
The limitations have become painfully clear in recent years, with
attacks demonstrated against the embedded computers in batteries,
webcams, USB devices, IPMI controllers, and now disk drives. We
no longer have a simple concentric trust model of firewall, TCB, kernel,
firmware, hardware. Do we have to trust something? What? Where do
we get these trusted objects from? How do we assure ourselves that
they haven't been tampered with?

I'm not looking for concrete answers right now. (Some of the work in secure
multiparty computation suggests that we need not trust anything,
if we're willing to accept a very significant performance penalty.)
Rather, I want to know how to think about the problem. Other than
the now-conceputal term TCB, which has been redfined as "that stuff
we have to trust, even if we don't know what it is",
we don't even have the right words. Is there still such a thing? If so,
how do we define it when we no longer recognize the perimeter of even
a single computer? If not, what should replace it?
We can't make our systems Andromedan-proof if we don't know what
we need to protect against them.

19 February 2015

The most interesting feature of the newly-described
"Equation
Group"
attacks has been the ability to
hide malware
in disk drive firmware.
The threat is ghastly: you can wipe the disk and reinstall
the operating system, but the modified firmware in the disk controller
can reinstall nasties. A common response has been to suggest
that firmware shouldn't be modifiable unless a physical switch is
activated. It's a reasonable thought, but it's a lot harder to
implement than it seems, especially for the machines of most interest
to nation-state attackers.

One problem is where this switch should be. It's easy enough on
a desktop or even a laptop to have a physical switch somewhere. (I've
read that some Chromebooks actually have such a thing.) It's a lot harder
to find a good spot on a smartphone, where space is very precious.
The switch should be very difficult to operate by accident, but findable
by ordinary users when needed. (This means that a switch on the bottom
is probably a bad idea, since people will be turning their devices
over constantly, moving between the help page that explains where the
switch is and the bottom to try to find it….) There will also be the
usual percentage of people who simply obey the prompts to flip the switch
because of course the update they've just received is legitimate…

A bigger problem is that modern computers have lots of
processors, each of which has its own firmware. Your keyboard
has a CPU. Your network cards have CPUs. Your flash drives
and SD cards have CPUs. Your laptop's webcam
has
a CPU.
All of these CPUs have firmware; all can be targeted by malware.
And if we're going to use a physical switch to protect them, we either
need a separate switch for each device or a way for a single switch
to control all of these CPUs. Doing that probably requires special
signals on various internal buses, and possibly new interface standards.

The biggest problem, though, is with all of the computers that
the net utterly relies on, but that most uesrs never see:
the servers.
Many companies have them: rows of tall racks, each filled with
anonymous "pizza boxes". This is where your data lives: your email,
your files, your passwords, and more. There are many of them, and
they're not updated by someone going up to each one and clicking "OK"
to a Windows Update prompt. Instead, a sysadmin (probably an underpaid
underappreciated, overstressed sysadmin) runs a script that will
update them all, on a carefully planned schedule. Flip a switch?
The data center with all of these racks may be in another state!

If you're a techie, you're already thinking of solutions. Perhaps
we need another processor, one that would enable all sorts of things
like firmware update. As it turns out, most servers already have
a special management processor called
IPMI
(Intelligent Platform Management Interface).
It would be the perfect way to control firmware updates, too, except
for one thing: IPMI itself has
serious
security issues…

A real solution will take a few years to devise, and many more to roll
out. Until then, the best hope is for Microsoft, Apple, and the various
Linux distributions to really harden any interfaces that provide
convenient ways for malware to issue strange commands to the disk.
And that is itself a very hard problem.

27 February 2015

There's been a lot of controversy over the FCC's new
Network
Neutrality
rules. Apart from the really big issues—should there
be such rules at all? Is reclassification the right way to
accomplish it?—one particular point has caught the eye
of network engineers everywhere: the statement that packet
loss should be published as a performance metric, with the consequent
implication that ISPs should strive to achieve as low a value
as possible. That would be very bad thing to do. I'll give a
brief, oversimplified explanation of why;
Nicholas
Weaver
gives more technical details.

Let's consider a very simple case: a consumer on a phone trying to download
an image-laden web page from a typical large site.
There's a big speed mismatch: the
site can send much faster than the consumer can receive. What will
happen? The best way to see it is by analogy.

Imagine a multiline superhighway, with an exit ramp to a low-speed
local road. A lot of cars want to use that exit, but of course it
can't can't handle as many cars, nor can they drive as fast. Traffic
will start building up on the ramp, until a cop sees it and doesn't
let more cars try to exit until the backlog has cleared a bit.

Now imagine that every car is really a packet, and a car that can't
get off at that exit because the ramp is full is a dropped packet. What
should you do? You could try to build a longer exit ramp, one that will
hold more cars, but that only postpones the problem. What's really
necessary is a way to slow down the desired exit rate. Fortunately,
on the Internet we can do that, but I have to stretch the analogy
a bit further.

Let's now assume that every car is really delivering pizza to some house.
When a driver misses the exit, the pizza shop eventually notices and
sends out a replacement pizza, one that's nice and hot. That's more
like the real Internet: web sites notice dropped packets, and retransmit
them. You rarely suffer any ill effects from dropped packets, other than
lower throughput. But there's a very important difference here between
a smart Internet host and a pizza place:
Internet hosts interpret dropped packets as a signal to slow down.
That is, the more packets are dropped (or the more cars who are waved past
the exit), the slower the new pizzas are sent. Eventually, the sender
transmits at exactly the rate at which the exit ramp can handle the
traffic. The sender may try to speed up on occasion. If the ramp can
now handle the extra traffic, all is well; if not, there are more dropped
packets and the sender slows down again. Trying for a zero drop rate simply
leads to more congestion; it's not sustainable. Packet drops are the only
way the Internet can match sender and receiver speeds.

The reality on the Internet is far more complex, of course. I'll
mention only aspects of it; let it suffice to say that congestion
on the net is in many ways worse than a traffic jam.
First, you can get this sort of congestion
at every "interchange". Second, it's not just your pizzas
that are slowed down, it's all of the other "deliveries" as well.

How serious is this? The Internet was almost stillborn because
this problem was not understood until the late 1980s. The network
was dying of "congestion collapse" until
Van Jacobson
and his colleagues
realized what was happening and showed how packet drops would
solve the problem. It's that simple and that important, which
is why I'm putting it in bold italics:
without using packet drops for speed matching, the Internet
wouldn't work at all, for anyone.

Measuring packet drops isn't a bad idea. Using the rate, in isolation,
as a net neutrality metric is not just a bad idea, it's truly horrific.
It would cause exactly the problem that the new rules are
intended to solve: low throughput at inter-ISP connections.