Distributed Computing Sanity Checking

Dozens of distributed computing (DC) projects have recently become available
for interested parties to download and run on their machines. As DC is becoming
more and more popular, groups are jumping on the bandwagon to take advantage of
this wondrous opportunity for "free" computer time. However, as with any other
new technology these days, security and privacy have become important
issues.

The security problem can be divided into two distinct facets: client
security and server security. The former involves security on the computers of
the volunteers running the distributed application. This is extremely
important, perhaps even more important than with other types of software, because DC
clients often communicate over the network. A poorly designed client could be
hijacked into a back door for hackers and miscreants. By simply downloading and
installing the application, volunteers implicitly trust the authors of the
software not to do nasty things to their computers. After all, these programs
are usually written by research groups, not software companies, and do not go
through the same level of QA testing as commercial software. How can you be
certain a client will not accidentally format the hard drive? Perhaps a
particularly malicious programmer could release a DC app that secretly steals all of your credit card info the next time you enter it into a web application. None of
this is beyond the realm of possibility.

Server security is another issue altogether. Most projects seek to answer a
question or solve some scientific problem. The experiment is compromised
unless the integrity of results returned by clients can be guaranteed.
Similarly, if someone breaks into the server and changes results, the
experiment is invalidated. Some users have also been clever enough to find ways
to cheat, for example, uploading the same "work units" multiple times. This
gives them extra credits and makes it appear as if they are doing lots of work.
While perhaps impressing their friends, this sort of behavior can often be
destructive to the project, biasing or perhaps completely ruining results
computed on the server end. The SETI@Home project has had
several problems with users cheating and exploiting loopholes. The lag in the
project managers fixing this hole cost them many users who became upset by
rampant cheating.

The purpose of this article is not to make you paranoid about running DC
projects or to turn you off to them. By all means, donate your CPU cycles to
worthy projects! It is also not to reveal any secrets or security holes of
existing DC projects; any security compromised by this article was never
really secure to begin with. I do not claim to be an expert in computer and
network security, by any stretch of the imagination. However, with new projects
appearing weekly, you should be cautious and evaluate new projects from a
security standpoint before signing up. The following sections discuss things to
look for to ensure a DC project is secure, as well as some things to do to
improve the security of your own DC project.

Signatures and Hashes

The server will often need to send data to the clients running the DC
application. Any time data is sent to a machine, security measures
must be put in place. This data may be new work to process or a new version of
the application; either way, sending unprotected raw data over the network is
just asking for trouble. It would be relatively easy for a third party to pose
as the server and deliver arbitrary code to your computer, especially in the
case of client updates to the executable, which would then be automatically (or
manually) executed. This may be a lesser problem on Unix if the software is
run as a user with minimal permissions, but clearly this is still
unacceptable.

This is where digital signatures come in handy. Digital signatures are a way
of guaranteeing to a client that a certain message came from a trusted
individual. It works through the use of an encryption keypair. A trusted
individual, usually the author of the DC client, has the private, secret key.
He signs the document to be protected with this key. The document is
then delivered to the client, who has the corresponding public key. The client
can verify the signature on the document using the public key. Only someone who
knows the private key could have signed the document such that the public key
verifies the signature and so the document must have come from the trusted
source. Most importantly, a document cannot be modified once signed or the
verification will fail.

By signing all updates sent to users and embedding the public key in the DC
client, users can be reassured that no one can easily intercept or change
update files for their own purposes. As long as users trust the document
signer, their computers will be safe.

On the other side of the fence, project managers must ensure the integrity
of the data being returned by the users. They should be reasonably confident
that all incoming data is in fact being generated by the clients (and not
manually, to cheat), that work is not being duplicated, and that in fact data
integrity is maintained from when it is generated by the client until it is
stored on the server. While the latter point may sound a bit excessive,
remember thousands of people will likely be sending data to the server and you
must be prepared for the unexpected. Modern network connections are generally
quite reliable and transfer data flawlessly but a bad cable or weak connection
along the way could still cause a few bad bits to sneak in. More commonly,
users may have overclocked CPUs or bad RAM chips (more often than you may
think). These can lead to corrupt data files which will then be uploaded.
Depending on the exact nature of the problem, the data file may still look
entirely legitimate though it contains incorrect data. This is perhaps the
scariest problem of all for a DC project organizer.

Again, digital signatures can be used to identify packets as having come
from the DC client. As a bonus you get a free data integrity check along with
it at the server end. However, there is a catch. You must include the private
key with the client, so it can do the signing. A clever computer whiz could
then extract the private key, make some fake data files, and sign them, making
them indistinguishable from normal packets. Although this is difficult and
unlikely, you must consider all possibilities when dealing with security
issues.

Another simpler approach in this case would be to use hashes. A hash, or
checksum, is simply a function which takes an input, usually an arbitrarily
complex set of data, and outputs a simple summary or representation of that
information. Popular hashing algorithms include SHA1, MD5 (message digest 5) and CRC32 (cyclic redundancy check). Each of these digests input files or "messages", producing a hexadecimal checksum value for each file, 32-bits in length for CRC32 and 128 bits for MD5. Since there are a finite number of outputs (2^32 or 2^128) and an infinite number of inputs, the digest value is not guaranteed to be unique. However, it has been shown that it is extremely unlikely that any two files will have the same checksum, unless carefully contrived to produce such a result. It is even less likely that the same checksum will occur if minor errors or changes are made to a file, due to faulty RAM for example, because of the way in which they are computed.

Sending a message hash along with the message allows the recipient to
compute the hash independently and validate the integrity of the message. This
prevents messages from being corrupted in transit, and also forces the sender
to know what hash to send and when and where to send it. Again, a clever user
could likely find a way to forge an incoming message, sending the proper hash
with the upload to make it look as if it were coming from the client software,
but this will at least stop the average user from doing so. The hash can also
be used to track uploaded data, to avoid counting duplicates (for example, a
user trying to upload the same work twice for credit). To do this, only store
the hashes of previously uploaded work instead of all the work itself and check
newly uploaded work against the list of hashes to see whether it has already
been received before.

Because users possess the binary executable of the client, anything within
it that it may use to identify itself to the server as the source of the upload
could theoretically be spoofed. Outgoing network packets can be sniffed and,
even if encrypted, the encryption key must then be present on the user
machines. The best we can do is make this task non-trivial. However, this does
not mean the success of the project is in jeopardy. Logic and common sense will
normally still prevail. Regardless of the nature of the project, the designers
generally have some idea of what to expect, and in some cases can verify
results. For example when searching for new prime numbers, once one is
allegedly found, it can easily and quickly be validated by hand. When
generating molecular simulations, they can be tested for discontinuities or
unrealistic parameter values. The point is that, no matter how careful the project
managers may be, all results should still be manually verified at some point.
At a minimum, they must ask themselves "Are the results what we expected? Why or why not? Is this reasonable?" After all, if the results cannot be reproduced or verified, then from a scientific point of view, the experiment is
useless and poorly designed.

Maintaining Project Integrity

Many projects are likely to use data files of some sort or another to help
them do their work. These may be parameter files, configuration files, and so
forth. While some, such as configuration files, may change over time, some will
not and are effectively read-only. These will be for the most part large tables
of numbers or words (a character set, an energy force field, dictionary, etc.),
which have not been directly hardcoded into the program for one reason or
another. To ensure the integrity of the project, these data files must be
protected from accidental modification. Binary files are less likely to be
changed than plain text ones, but both types should be protected from
unauthorized modification to avoid cheating or, more generally, invalid
results.

These files lie on the user's file system, however, so making them read-only
will stop only the most neophyte hacker. The solution? Again, checksums are our
friends. Simply compute MD5, CRC, or some other checksums for the correct files
and hard code them into the program. If the checksum match fails, the program
should exit with an appropriate error message. Of course someone could always
go into the binary with a hex editor, find the checksum, and change it (maybe),
but again this requires a much more ambitious and knowledgeable individual.
Similar measures could even be taken to ensure no one tampers with the binary
itself, like checksumming itself or checking the date stamp or file size. Keep
in mind here that text files may have different checksums and sizes on Windows
compared to Unix, due to the extra carriage return characters in Windows text
files. In this case you may want to compute checksums for both environments and
allow either one to be valid.

Social Engineering

The biggest security hole in most any application or environment is people.
Regardless of how deep an encryption you use, how many layers of security there
are, and how complex your code, the human element will always remain the one
unpredictable thing. Social engineering is the art of getting information out of people org
getting them to do things for you, without them realizing it. You may think, "I would never give away any secrets of my project if I ran a DC project". Many very intelligent people have been fooled in the past by clever social
engineers, and many more will no doubt follow. Kevin D. Mitnick wrote an
excellent book the subject, The Art of Deception: Controlling the Human Element of Security.

As an example, suppose, as DC project technical support, you receive a
message from joeblow@hotmail.com, saying he forgot his password and would like
you to send it to him. This sounds perfectly innocent, so you look up his
password and send it to him. After all, he signed up with that address so the
password must belong to him. But you don't notice that the reply-to address is
in fact joe_blow@hotmail.com. In fact maybe it's a different address
altogether. You have just given joeblow's password away to a stranger , perhaps his biggest competitor.

By spoofing mail header fields (including From: and
To:), trickery, deception, and outright lying, you can very easily
be caught off guard and fooled into giving away information that you should
not. As a general rule, systems for sending lost passwords, registration, and
so on, should be fully automated. After all, computers don't make mistakes and
cannot be tricked into revealing information that they are not programmed to.

If you must give out information manually, be
careful about what you give out. Keep written records of everything you give
out so you can always go back to it if there is a problem later, and if a
request sounds the least bit suspicious, check the message headers to see if
they appear to be spoofed. You can always verify that
they really sent you the message in question as well. They will be glad you were
extra cautious before revealing any of their personal information. Lastly,
never send out requests yourself by email for people to provide you with any
sort of personal info. Legitimate companies never ask customers to reveal
private information by email. Neither should you.

Summary

In the end, security in a DC project boils down to
common sense. Always check the final results turned in to the
server. Results turned in that seem too good to be true or seem
like major outliers should be reproduced by hand. If that is not
possible, reconsider how the project generates
data in the first place. A non-reproducible experiment is not
science.

Users caught cheating or trying to compromise the integrity of the project
should be dealt with swiftly and removed from the project, however be sure the
apparent cheating is indeed a result of the user and not a bug in the program
code or even faulty computer hardware! Often talking to the user via e-mail
will quickly establish which is the case.

If you're still paranoid about security after reading this article,
firms exist which will perform professional security audits on any system you
desire. They will look both for software and hardware issues and inform you of
any places that they feel are insecure and need some work. These are
professionals who do this for a living, and are generally quite good at what
they do. You can never be 100% certain that your system is secure from attack
but a thorough security audit, if you have the money for it, will get you 99%
of the way there.

Howard Feldman
is a research scientist at the Chemical Computing Group in Montreal, Quebec.