The Internet Worm Program: An Analysis
Purdue Technical Report CSD-TR-823
Eugene H. Spafford
Department of Computer Sciences Purdue University
West Lafayette, IN 47907-2004
spaf@cs.purdue.edu
ABSTRACT
On the evening of 2 November 1988, someone infected the Internet
with a worm program. That program exploited flaws in utility
programs in systems based on BSD-derived versions of UNIX. The
flaws allowed the program to break into those machines and copy
itself, thus infecting those systems. This infection eventually
spread to thousands of machines, and disrupted normal activities
and Internet connectivity for many days. This report gives a
detailed description of the components of the worm
program\320data and functions. It is based on study of two
completely independent reverse-compilations of the worm and a
version disassembled to VAX assembly language. Almost no source
code is given in the paper because of current concerns about the
state of the ``immune system'' of Internet hosts, but the
description should be detailed enough to allow the reader to
understand the behavior of the program. The paper contains a
review of the security flaws exploited by the worm program, and
gives some recommendations on how to eliminate or mitigate their
future use. The report also includes an analysis of the coding
style and methods used by the author\(s\) of the worm, and draws
some conclusions about his abilities and intent.
Copyright 1988 by Eugene H. Spafford. All rights reserved.
Permission is hereby granted to make copies of this work, without
charge, solely for the purposes of instruction and research. Any
such copies must include a copy of this title page and copyright
notice. Any other reproduction, publication, or use is strictly
prohibited without express written permission. November 29, 1988
The Internet Worm Program: An Analysis
Purdue Technical Report CSD-TR-823
Eugene H. Spafford
Department of Computer Sciences
Purdue University West Lafayette, IN 47907-2004
spaf@cs.purdue.edu
Introduction
On the evening of 2 November 1988 the Internet came under attack
>From within. Sometime round 6 PM EST, a program was executed on
one or more hosts connected to the Internet. This program
collected host, network, and user information, then broke into
other machines ???using flaws present in those systems' software.
After breaking in, the program would replicate itself and the
replica would also attempt to infect other systems. Although the
program would only infect Sun Microsystems Sun 3 systems, and VAX
computers running variants of 4 BSD UNIX the program spread
quickly, as did the confusion and consternation of system
administrators and users as they discovered that their systems
had been infected. Although UNIX has long been known to have some
security weaknesses \(cf. [Ritc79], [Gram84], and [Reid87]\), the
scope of the breakins came as a great surprise to almost
everyone. he program was mysterious to users at sites where it
appeared. Unusual files were left in the usr/tmp directories of
some machines, and strange messages appeared in the log files of
some of the utilities, such as the sendmail mail handling agent.
The most noticeable effect, however, was that systems became more
and more loaded with running processes as they became repeatedly
infected. As time went on, some of these machines became so
loaded that they were unable to continue any processing; some
machines failed completely when their swap space or process
tables were exhausted. By late Thursday night, personnel at the
University of California at Berkeley and at Massachusetts
Institute of Technology had ``captured'' copies of the program
and began to analyze it. People at other sites also began to
study the program and were developing methods of eradicating it.
A common fear was that the program was somehow tampering with
system resources in a way that could not be readily detected and
that while a cure was being sought, system files were being
altered or information destroyed. By 5 AM EST Thursday morning,
less than 12 hours after the infection started on the network,
the Computer Systems Research Group at Berkeley had developed an
interim set of steps to halt its spread. This included a
preliminary patch to the sendmail mail agent, and the suggestion
to rename one or both of the C compiler and loader to prevent
their use. These suggestions were published in mailing lists and
on the usenet, although their spread was hampered by systems
disconnecting from the Internet to attempt a ``quarantine.''
By about 7 PM EST Thursday, another simple, effective method of
stopping the infection, without renaming system utilities, was
discovered at Purdue and also widely published. Software patches
were posted by the Berkeley group at the same time to mend all
the flaws that enabled the program to invade systems. All that
remained was to analyze the code that caused the problems. On
November 8, the National Computer Security Center held a
hastily-convened workshop in Baltimore. The topic of discussion
was the program and what it meant to the internet community. Who
was at that meeting and why they were invited, and the topics
discussed have not yet been made public.
However, one thing we know that was decided by those present at
the meeting was that they would not distribute copies of their
reverse-engineered code to the general public. It was felt that
the program exploited too many little-known techniques and that
making it generally available would only provide other attackers
a framework to build another such program. Although such a stance
is well-intended, it can serve only as a delaying tactic. As of
November 27, I am aware of at least five versions of the
decompiled code, and because of the widespread distribution of
the binary, I am sure there are at least ten times that many
versions already completed or in progress and the required skills
and tools are too readily available within the community to
believe that only a few groups have the capability to reconstruct
the source code. any system administrators, programmers, and
managers are interested in how the program managed to establish
itself on their systems and spread so quickly These individuals
have valid interest in seeing the code, especially if they are
software vendors. Their interest is not to duplicate the program,
but to be sure that all the holes used by the program are
properly plugged. Furthermore, examining the code may help
administrators and vendors develop defenses against future
attacks, despite the claims to the contrary by some of the
individuals with copies of the reverse-engineered code. This
report is intended to serve an interim role in this process. It
is a detailed description of how the program works, but does not
provide source code that could be used to create a new worm
program. As such, this should be an aid to those individuals
seeking a better understanding of how the code worked, yet it is
in such a form that it cannot be used to create a new worm
without considerable effort. Section 3 and Appendix C contain
specific observations about some of the flaws in the system
exploited by the program, and their fixes. A companion report, to
be issued in a few weeks, will contain a history of the worm's
spread through the Internet. This analysis is the result of a
study performed on three separate reverse-engineered versions of
the worm code. Two of these versions are in C code, and one in
VAX assembler. All three agree in all but the most minor details.
One C version of the code compiles to binary that is identical to
the original code, except for minor differences of no
significance. As such, I can state with some certainty that if
there was only one version of the worm program, then it was
benign in intent. The worm did not write to the file system
except when transferring itself into a target system. It also did
not transmit any information from infected systems to any site,
other than copies of the worm program itself. Since the Berkeley
Computer Systems Research Group as already published official
fixes to the flaws exploited by the program, we do not have to
worry about these specific attacks being used again. Many vendors
have also issued appropriate patches. It now remains to convince
the remaining vendors to issue fixes, and users to install them.
Terminology
There seems to be considerable variation in the names applied to
the program described in this paper. I use the term worm instead
of virus based on its behavior. Members of the press have used
the term virus, possibly because their experience to date has
been only with that form of security problem. This usage has been
reinforced by quotes from computer managers and programmers also
unfamiliar with the terminology. For purposes of clarifying the
terminology, let me define the difference between these two terms
and give some citations to their origins: worm is a program that
can run by itself and can propagate a fully working version of
itself to other machines. It is derived from the word tapeworm, a
parasitic organism that lives inside a host and saps its
resources to maintain itself. virus is a piece of code that adds
itself to other programs, including operating systems. it cannot
run independently and it requires that its ``host'' program be
run to activate it. As such, it has a clear analog to biological
viruses and those viruses are not considered alive in the usual
sense; instead, they invade host cells and corrupt them, causing
them to produce new viruses. The program that was loosed on the
Internet was clearly a worm.
2.1. Worms
The concept of a worm program that spreads itself from machine to
(machine was apparently first described by John Brunner in 1975
in his classic science fiction novel The Shockwave Rider.
[Brun75] He called these programs tapeworms that lived
``inside'' the computers and spread themselves to other
machines. In 1979-1981, researchers at Xerox PARC built and
experimented with worm programs. They reported their experiences
in an article in 1982 in Communications of the ACM. [Shoc82] The
worms built at PARC were designed to travel from machine to
machine and do useful work in a distributed environment. They
were not used at that time to break into systems, although some
did ``get away'' during the tests. A few people seem to prefer to
call the Internet Worm a virus because it was destructive, and
they believe worms are non-destructive. Not everyone agrees that
the Internet Worm was destructive, however. Since intent and
effect are sometimes difficult to judge, using those as a naming
criterion is clearly insufficient. As such, worm continues to be
the clear choice to describe this kind of program.
2.2. Viruses
The first (use of the word virus \(to my knowledge\) to describe
something that infects a computer was by David Gerrold in his
science fiction short stories about the G.O.D. machine. These
stories were later combined and expanded to form the book
When Harlie Was One. [Gerr72] (A subplot in that book described a
program named VIRUS created by an unethical scientist. A
computer infected with VIRUS would randomly dial the phone until
it found another computer. It would then break into that system
and infect it with a copy of VIRUS. This program would infiltrate
the system software and slow the system down so much that it
became unusable except to infect other machines\). The inventor
had plans to sell a program named VACCINE that could cure VIRUS
and prevent infection, but disaster occurred when noise on a
phone line caused VIRUS to mutate so VACCINE ceased to be
effective. The term computer virus was first used in a formal
way by Fred Cohen at USC. [Cohe84] He defined the term to mean
a security problem that attaches itself to other code and turns
it into something that produces viruses; to quote from his
paper: ``We define a computer `virus' as a program that can
infect other programs by modifying them to include a possibly
evolved copy of itself.'' He claimed the first computer virus was
``born'' on November 3, 1983, written by himself for a security
seminar course.
The interested reader may also wish to consult [Denn88] and
[Dewd85] for further discussion of the terms.
3. Flaws and Misfeatures
3.1. Specific Problems
The actions of the Internet Worm exposed some specific security
flaws in standard services provided by BSD-derived versions of
UNIX. Specific patches for these flaws have been widely
circulated in days since the worm program attacked the Internet.
Those flaws and patches are discussed here.
3.1.1. fingerd and gets
The finger program is a utility that allows users to obtain
information about other users. It is usually used to identify
the full name or login name of a user, whether or not a user is
currently logged in, and possibly other information about the
person such as telephone numbers where he or she can be reached.
The fingerd program is intended to run as a daemon, or background
process, to service remote requests using the finger protocol.
[Harr77] The bug exploited to break fingerd involved overrunning
the buffer the daemon used for input. The standard C library has
a few routines that read input without checking for bounds on
the buffer involved. In particular, the gets call takes input to
a buffer without doing any bounds checking; this was the call
exploited by the worm. The gets routine is not the only routine
with this flaw. The family of routines scanf/fscanf/sscanf may
also overrun buffers when decoding input unless the user
explicitly specifies limits on the number of characters to be
converted. Incautious use of the sprintf routine can overrun
buffers. Use of the strcat/strcpy calls instead of the
strncat/strncpy routines may also overflow their buffers.
Although experienced C programmers are aware of the problems with
these routines, they continue to use them. Worse, their format
is in some sense codified not only by historical inclusion in
UNIX and the C language, but more formally in the forthcoming
ANSI language standard for C. The hazard with these calls is
that any network server or privileged program using them may
possibly be compromised by careful precalculation of the
inappropriate input. An important step in removing this hazard
would be first to develop a set of replacement calls that accept
values for bounds on their program-supplied buffer arguments.
Next, all system servers and privileged applications should be
examined for unchecked uses of the original calls, with those
calls then being replaced by the new bounded versions. Note that
this audit has already been performed by the group at Berkeley;
only the fingerd and timed servers used the gets call, and
patches to fingerd have already been posted. Appendix C contains
a new version of fingerd written specifically for this report
that may be used to replace the original version. This version
makes no calls to gets.
3.1.2. Sendmail
The sendmail program is a mailer designed to route mail in a
heterogeneous internetwork. [Allm83] The program operates in a
number of modes, but the one of most interest is when it is
operating as a daemon process. In this mode, the program is
``listening'' on (a TCP port \(#25\) for attempts to deliver
mail using standard Internet protocols, principally SMTP
\(Simple Mail Transfer Protocol\). [Post82] When such a request
is detected, the daemon enters into a dialog with the remote
mailer to determine sender, recipient, delivery instructions, and
message contents. The bug exploited in sendmail had to do with
functionality provided by a debugging option in the code. The
worm would issue the DEBUG command to sendmail and then specify
a set of commands instead of a user address as the recipient of
the message. Normally, this (is not allowed, but it is present
in the debugging code to allow testers to verify that mail is
arriving at a particular site without the need to activate the
address resolution routines. The debug option of sendmail is
often used because of the complexity of configuring the mailer
for local conditions, and many vendors and site administrators
leave the debug option compiled in. The sendmail program is of
immense importance on most Berkeley-derived \(and other\) UNIX
systems because it handles the complex tasks of mail routing and
delivery. Yet, despite its importance and wide-spread use, most
system administrators (know little about how it works. Stories
are often related about how system administrators will attempt to
write new device drivers or otherwise modify the kernel of the
OS, yet they will not willingly attempt to modify sendmail or
its configuration files. It is little wonder, then, that bugs
are present (in sendmail that allow unexpected behavior. Other
flaws have been found and reported now that attention has been
focused on the program, but it is not known for sure if all the
bugs have been discovered and all the patches circulated. One
obvious approach would be to dispose of sendmail and come (up
with a simpler program to handle mail. Actually, for purposes
of verification, developing a suite of cooperating programs
would be a better approach, and more aligned with the UNIX
philosophy. In effect, sendmail is fundamentally flawed, not
because of anything related to function, (but because it is too
complex and difficult to understand.
The Berkeley Computer Systems Research Group has a new version of
sendmail with many bug fixes and fixes for security flaws. This
version of sendmail is available for FTP from the host
``ucbarpa.berkeley.edu'' and will be present in the file
~ftp/pub/sendmail.tar.Z by the end of November 1988. Note that
this version is shipped with the DEBUG option disabled by
default. However, this does not help system administrators who
wish to enable the DEBUG option, although the researchers at
Berkeley believe they have fixed (all the security flaws
inherent in that facility. One approach that could be taken with
the program would be to have it prompt the user for the password
of the super user \(root\) when the DEBUG command is given. A
static password should never be compiled into the program because
(this would mean that the same password might be present at
multiple sites and seldom changed. For those sites without
access to FTP or otherwise unable to obtain the new version, the
official patches to sendmail are enclosed in Appendix D.
3.2. Other Problems
Although the worm exploited flaws in only two server programs,
its behavior has served to illustrate a few fundamental problems
that have not yet been widely addressed. In the interest of
promoting better security, some of these problems are discussed
here. (The interested reader is directed to works such as
[Gram84] for a broader discussion of related issues.
3.2.1. Servers in general
A security flaw not exploited by the worm, but now becoming
obvious, is that many system services have configuration and
command files owned by the same userid. Programs like sendmail,
the at service, and other facilities are often all owned by the
same (non-user id. This means that if it is possible to abuse
one of the services, it might be possible to abuse many. One way
to deal with the general problem is have every daemon and
subsystem run with a separate userid. That way, the command and
data files for each subsystem could (be protected in such a way
that only that subsystem could have write \(and perhaps read\)
access to the files. This is effectively an implementation of
the principle of least privilege. Although doing this might add
an extra dozen user ids to the system, it is a small (cost to
pay, and is already sup ported in the UNIX paradigm. Services
that should have separate ids include sendmail, news, at,
finger, ftp, uucp and YP.
3.2.2. Passwords
A key attack of the worm program involved attempts to discover
user passwords. It was able to determine success because the
encrypted password of each user was in a publicly readable file.
This allows an attacker to encrypt lists of possible passwords
and then compare them against the actual passwords without
passing through any system function. In effect, the security of
the passwords is provided in large part by the prohibitive effort
of trying all combinations of letters. Unfortunately, as machines
get faster, the cost of such attempts decreases. Dividing the
task among multiple processors further reduces the time needed to
decrypt a password. It (is currently feasible to use a
supercomputer to precalculate all probable passwords and store
them on optical media. Although not \(currently\) portable, this
scheme would allow someone with the appropriate resources access
to any account for which they could read the password field and
then consult (their database of pre-encrypted passwords. As the
density of storage media increases, this problem will only get
more severe. A clear approach to reducing the risk of such
attacks, and an approach that has already been taken in some
variants of UNIX, would be to have a (shadow) password file. The
encrypted passwords are saved in a file that is readable only by
the system administrators, and a privileged call performs
password encryptions and comparisons with an appropriate delay
\(.5 to 1 second, for instance\). This would prevent any attempt
to ``fish'' for passwords. Additionally, a threshold could be
included to check for repeated password attempts from the same
process, resulting in some form of alarm being raised. Shadow
password files should be used in combination with encryption
rather than in place of such techniques, however, or one problem
is simply replaced by a different one; the combination of the
two methods is stronger than either one alone. Another way to
strengthen the password mechanism would be to change the utility
that sets user passwords. The utility currently makes minimal
attempt to ensure that new passwords are nontrivial to guess. The
program could be strengthened in such a way that it would reject
any choice of a word currently in the on-line dictionary or based
on the account name.
4. High-Level Description of the Worm
This section contains a high-level overview of how the worm
program functions. The description in this section assumes that
the reader is familiar with UNIX and somewhat familiar with
network facilities under UNIX. Section 5 describes the individual
functions and structures in more detail. The worm consists of
two parts: a main program, and a bootstrap or vector program
\(described in Appendix B\). We will start our description from
the point at which a host is about to be infected. At this
point, a worm running on another machine has either succeeded in
establishing a shell on the new host and has connected back to
the infecting machine via a TCP connection, or it has connected
to the SMTP port and is transmitting to the sendmail program.
The infection proceeded as follows: 1\) A socket was established
on the infecting machine for the vector program to connect to
\(e.g., socket number 32341\). A challenge string was constructed
>From a random number \(e.g., 8712440\). A file name base was
also constructed using a random number \(e.g., 14481910\). 2\)
The vector program was installed and executed using one of two
methods: 2a\) Across a TCP connection to a shell, the worm would
send the following commands \(the two lines beginning with
``cc'' were sent as a single line\):
PATH=/bin:/usr/bin:/usr/ucb cd /usr/tmp echo gorch49; sed '/int
zz/q' > x14481910.c;echo gorch50 [text of vector
program\320enclosed in Appendix B] int zz; cc (-o x14481910
x14481910.c;./x14481910 128.32.134.16 32341 8712440; rm -f
x14481910 x14481910.c;echo DONE
Then it would wait for the string ``DONE'' to signal that the
vector program was running. 2b\) Using the SMTP connection, it
would transmit \(the two lines beginning with ``cc'' were sent
as a single line\):
debug mail from: rcpt to: data cd /usr/tmp cat > x14481910.c <From user .rhosts files early on. It also did not attempt to
collect host names from other user and system files containing
such names \(e.g., /etc/hosts.lpd\). Many of the operations could
have been done ``smarter.'' The case of using linear structures
has already been mentioned. Another example would have been to
sort user passwords by the salt used. If the same salt was
present in more than one password, then all those passwords could
be checked in parallel as a single pass was made through the
dictionaries. On our machine, 5% of the 200 passwords share the
same salts, for instance. No special advantage was taken if the
root password was compromised. Once the root password has been
broken, it is possible to fork children that set their uid and
environment variables to match each designated user. These
processes could then attempt the rsh attack described earlier in
this report. Instead, root is treated as any other account. It
has been suggested to me that this treatment of root may have
been a conscious choice of the worm author\(s\). Without knowing
the true motivation of the author, this is impossible to decide.
However, considering the design and intent of the program, I find
it difficult to believe that such exploitation would have been
omitted if the author had thought of it. The same attack used on
the finger daemon could have been extended to the Sun version of
the program, but was not. The only explanations that come to mind
why this was not done are that the author lacked the motivation,
the ability, the time, or the resources to develop a version for
the Sun. However, at a recent meeting, Professor Rick Rashid of
Carnegie-Mellon University was heard to claim that Robert T.
Morris, the alleged author of the worm, had revealed the fingerd
bug to system administrative staff at CMU well over a year ago.
15
Assuming this report is correct and the worm author is indeed Mr.
Morris, it is obvious that there was sufficient time to construct
a Sun version of the code. In fact, I asked three Purdue graduate
students \(Shawn D. Ostermann, Steve J. Chapin, and Jim N.
Griffoen to develop a Sun 3 version of the attack, and they did
so in under three hours. The Worm author certainly must have had
access to Suns or else he would not have been able to provide Sun
binaries to accompany the operational worm. Motivation should
also not be a factor considering everything else present in the
program. With time and resources available, the only reason I
cannot immediately rule out is that he lacked the knowledge of
how to implement a Sun version of the attack. This seemsunlikely,
but given the inconsistent nature of the rest of the code, it is
certainly a possibility. However, if this is the case, it raises
a new question: was the author of the Worm the original author of
the VAX fingerd attack? Perhaps the most obvious shortcoming of
the code is the lack of understanding about propagation and load.
The reason the worm was spotted so quickly and caused so much
disruption was because it replicated itself exponentially on some
networks, and because each worm carried no history with it.
Admittedly, there was a check in place to see if the current
machine was already infected, but one out of every seven worms
would never die even if there was an existing infestation.
Furthermore, worms marked for self-destruction would continue to
execute up to the point of having made at least one complete pass
through the password file. Many approaches could have been taken
by the author\(s\) to slow the growth of the worm or prevent
reinfestation; little is to be gained from explaining them here,
but their absence from the worm program is telling. Either the
author\(s\) did not have any understanding of how the program
would propagate, or else she/he/they did not care; the existence
in the Worm of mechanisms to limit growth tends to indicate that
it was a lack of understanding rather than indifference. Some of
the algorithms used by the Worm were reasonably clever. One in
particular is interesting to note: when trying passwords from the
built-in list, or when trying to break into connected hosts, the
worm would randomize the list of candidates for trial. Thus, if
more than one worm were present on the local machine, they would
be more likely to try candidates in a different order, thus
maximizing their coverage. This implies, however \(as does the
action of the pleasequit variable\) that the author\(s\) was not
overly concerned with the presence of multiple worms on the same
machine. More to the point, multiple worms were allowed for a
while in an effort to maximize the spread of the infection. This
also supports the contention that the author did not understand
the propagation or load effects of the Worm. The design of the
vector program, the ``thinning'' protocol, and the use of the
internal state machine were all clever and non-obvious. The
overall structure of the program, especially the code associated
with IP addresses, indicates considerable knowledge of networking
and the routines available to support it. The knowledge evidenced
by that code would indicate extensive experience with networking
facilities. This, coupled with some of the errors in the Worm
code related to networking, further support the thesis that the
author was not a careful programmer\320the errors in those parts
of the code were probably not errors because of ignorance or
inexperience.
6.3. Camouflage
Great care was taken to prevent the worm program from being
stopped. This can be seen by the caution with which new files
were introduced into a machine, including the use of random
challenges. It can be seen by the fact that every string compiled
into the worm was encrypted to prevent simple examination. It was
evidenced by the care with which files associated with the worm
were deleted from disk at the earliest opportunity, and the
corresponding contents were encrypted in memory when loaded. It
was evidenced by the continual forking of the process, and the
\(faulty\) check for other instances of the worm on the local
host. The code also evidences precautions against providing
copies of itself to anyone seeking to stop the worm. It sets its
resource limits so it cannot dump a core file, and it keeps
internal data encrypted until used. Luckily, there are other
methods of obtaining core files and data images, and researchers
were able to obtain all the information they needed to
disassemble and reverse-engineer the code. There is no doubt,
however, that the author\(s\) of the worm intended to make such a
task as difficult as possible.
6.4. Specific Comments
Some more specific comments are worth making. These are directed
to particular aspects of the code rather than the program as a
whole.
6.4.1. The sendmail attack
Many sites tend to experience substantial loads because of heavy
mail traffic. This is especially true at sites with mailing list
exploders. Thus, the administrators at those sites have
configured their mailers to queue incoming mail and process the
queue periodically. The usual configuration is to set sendmail to
run the queue every 30 to 90 minutes. The attack through sendmail
would fail on these machines unless the vector program were
delivered into a nearly empty queue within 120 seconds of it
being processed. The reason for this is that the infecting worm
would only wait on the server socket for two minutes after
delivering the ``infecting mail.'' Thus, on systems with delayed
queues, the vector process would not get built in time to
transfer the main worm program over to the target. The vector
process would fail in its connection attempt and exit with a
non-zero status. Additionally, the attack through sendmail
invoked the vector program without a specific path. That is, the
program was invoked with ``foo'' instead of ``./foo'' as was done
with the shell-based attack. As a result, on systems where the
default path used by sendmail's shell did not contain the current
directory \(``.''\), the invocation of the code would fail. It
should be noted that such a failure interrupts the processing of
subsequent commands \(such as the rm of the files\), and this may
be why many system administrators discovered copies of the vector
program source code in their /usr/tmp directories.
6.4.2. The machines involved
As has already been noted, this attack was made only on Sun 3
machines and VAX machines running BSD UNIX. It has been observed
in at least one mailing list that had the Sun code been compiled
with the -mc68010 flag, more Sun machines would have fallen
victim to the worm. It is a matter of some curiosity why more
machines were not targeted for this attack. In particular, there
are many Pyramid, Sequent, Gould, Sun 4, and Sun i386 machines on
the net.
16
If binary files for those had also been included, the worm could
have spread much further. As it was, some locations such as Ohio
State were completely spared the effects of the worm because all
their ``known'' machines were of a type that the worm could not
infect. Since the author of the program knew how to break into
arbitrary UNIX machines, it seems odd that he/she did not attempt
to compile the program on foreign architectures to include with
the worm.
6.4.3. Portability considerations
The author\(s\) of the worm may not have had much experience with
writing portable UNIX code, including shell scripts. Consider
that in the shell script used to compile the vector, the
following command is used: if [ -f sh ] The use of the [
character as a synonym for the test function is not universal.
UNIX users with experience writing portable shell files tend to
spell out the operator test rather than rely on therebeing a link
to a file named ``['' on any particular system. They also know
that the test operator is built-in to many shells and thus faster
than the external [ variant. The test invocation used in the worm
code also uses the -f flag to test for presence of the file named
sh. This provided us with the worm ``condom'' published Thursday
night:
17
creating a directory with the name sh in /usr/tmp causes this
test to fail, as do later attempts to create executable files by
that name. Experienced shell programmers tend to use the -e
\(exists\) flag in circumstances such as this, to detect not only
directories, but sockets, devices, named FIFOs, etc. Other
colloquialisms are present in the code that bespeak a lack of
experience writing portable code. One such example is the code
loop where file units are closed just after the vector program
starts executing, and again in the main program just after it
starts executing. In both programs, code such as the following is
executed: for \(i = 0; i < 32; i++\) close\(i\); The portable way
to accomplish the task of closing all file descriptors \(on
Berkeley-derived systems\) is to execute: for \(i = 0; i <
getdtablesize\(\); i++\) close \(i\); or the even more efficient
for \(i = getdtablesize\(\)-1; i >= 0; i--\) close\(i\); This is
because the number of file units available \(and thus open\) may
vary from system to system.
6.5. Summary
Many other examples can be drawn from the code, but the points
should be obvious by now: the author of the worm program may have
been a moderately experienced UNIX programmer, but s/he was by no
means the ``UNIX Wizard'' many have been claiming. The code
employs a few clever techniques and tricks, but there is some
doubt if they are all the original work of the Worm author. The
code seems to be the product of an inexperienced or sloppy
programmer. The person \(or persons\) who put this program
together appears to lack fundamental insight into some
algorithms, data structures, and network propagation, but at the
same time has some very sophisticated knowledge of network
features and facilities. The code does not appear to have been
tested \(although anything other than unit testing would not be
simple to do\), or else it was prematurely released. Actually, it
is possible that both of these conclusions are correct. The
presence of so much dead and duplicated code coupled with the
size of some data structures \(such as the 20-slot object code
array\) argues that the program was intended to be more
comprehensive.
7. Conclusions
It is clear from the code that the worm was deliberately designed
to do two things: infect as many machines as possible, and be
difficult to track and stop. There can be no question that this
was in any way an accident, although its release may have been
premature. It is still unknown if this worm, or a future version
of it, was to accomplish any other tasks. Although an author has
been alleged \(Robert T. Morris\), he has not publicly confessed
nor has the matter been definitively proven. Considering the
probability of both civil and criminal legal actions, a
confession and an explanation are unlikely to be forthcoming any
time soon. Speculation has centered on motivations as diverse as
revenge, pure intellectual curiosity, and a desire to impress
someone. This must remain speculation for the time being,
however, since we do not have access to a definitive statement
>From the author\(s\). At the least, there must be some question
about the psychological makeup of someone who would build and run
such software.
18
Many people have stated that the authors of this code
19
must have been ``computer geniuses'' of some sort. I have been
bothered by that supposition since first hearing it, and after
having examined the code in some depth, I am convinced that this
program is not evidence to support any such claim. The code was
apparently unfinished and done by someone clever but not
particularly gifted, at least in the way we usually associate
with talented programmers and designers. There were many bugs and
mistakes in the code that would not be made by a careful,
competent programmer. The code does not evidence clear
understanding of good data structuring, algorithms, or even of
security flaws in UNIX. It does contain clever exploitations of
two specific flaws in system utilities, but that is hardly
evidence of genius. In general, the code is not that impressive,
and its ``success'' was probably due to a large amount of luck
rather than any programming skill possessed by the author. Chance
favored most of us, however. The effects of this worm were
\(largely\) benign, and it was easily stopped. Had the code been
tested and developed further by someone more experienced, or had
it been coupled with something destructive, the toll would have
been considerably higher. I can easily think of several dozen
people who could have written this program, and not only done it
with far fewer \(if any\) errors, but made it considerably more
virulent. Thankfully, those individuals are all responsible,
dedicated professionals who would not consider such an act. What
we learn from this about securing our systems will help determine
if this is the only such incident we ever need to analyze. This
attack should also point out that we need a better mechanism in
place to coordinate information about security flaws and attacks.
The response to this incident was largely ad hoc, and resulted in
both duplication of effort and a failure to disseminate valuable
information to sites that needed it. Many site administrators
discovered the problem from reading the newspaper or watching the
television. The major sources of information for many of the
sites affected seems to have been Usenet news groups and a
mailing list I put together when the worm was first discovered.
Although useful, these methods did not ensure timely, widespread
dissemination of useful information \320 especially since they
depended on the Internet to work! Over three weeks after this
incident some sites are still not reconnected to the Internet.
This is the second time in six months that a major panic has hit
the Internet community.The first occurred in May when a rumor
swept the community that a ``logic bomb'' had been planted in Sun
software by a disgruntled employee. Many, many sites turned their
system clocks back or they shut off their systems to prevent
damage. The personnel at Sun Microsystems responded to this in an
admirable fashion, conducting in-house testing to isolate any
such threat, and issuing information to the community about how
to deal with the situation. Unfortunately, almost everyone else
seems to have watched events unfold, glad that they were not the
ones who had to deal with the situation. The worm has shown us
that we are all affected by events in our shared environment, and
we need to develop better information methods outside the network
before the next crisis. This whole episode should cause us to
think about the ethics and laws concerning access to computers.
The technology we use has developed so quickly it is not always
simple to determine where the proper boundaries of moral action
may be. Many senior computer professionals started their careers
years ago by breaking into computer systems at their colleges and
places of employment to demonstrate their expertise. However,
times have changed and mastery of computer science and computer
engineering now involves a great deal more than can be shown by
using intimate knowledge of the flaws in a particular operating
system. Entire businesses are now dependent, wisely or not, on
computer systems. People's money, careers, and possibly even
their lives may be dependent on the undisturbed functioning of
computers. As a society, we cannot afford the consequences of
condoning or encouraging behavior that threatens or damages
computer systems. As professionals, computer scientists and
computer engineers cannot afford to tolerate the romanticization
of computer vandals and computer criminals. This incident should
also prompt some discussion about distribution of security-
related information. In particular, since hundreds of sites have
``captured'' the binary form of the worm, and since personnel at
those sites have utilities and knowledge that enables them to
reverse-engineer the worm code, we should ask how long we expect
it to be beneficial to keep the code unpublished? As I mentioned
in the introduction, at least five independent groups have
produced reverse-engineered versions of the worm, and I expect
many more have been done or will be attempted, especially if the
current versions are kept private. Even if none of these versions
is published in any formal way, hundreds of individuals will have
had access to a copy before the end of the year. Historically,
trying to ensure security of software through secrecy has proven
to be ineffective in the long term. It is vital that we educate
system administrators and make bug fixes available to them in
some way that does not compromise their security. Methods that
prevent the dissemination of information appear to be completely
contrary to that goal. Last, it is important to note that the
nature of both the Internet and UNIX helped to defeat the worm as
well as spread it. The immediacy of communication, the ability to
copy source and binary files from machine to machine, and the
widespread availability of both source and expertise allowed
personnel throughout the country to work together to solve the
infection even despite the widespread disconnection of parts of
the network. Although the immediate reaction of some people might
be to restrict communication or promote a diversity of
incompatible software options to prevent a recurrence of a worm,
that would be entirely the wrong reaction. Increasing the
obstacles to open communication or decreasing the number of
people with access to in-depth information will not prevent a
determined attacker\320it will only decrease the pool of
expertise and resources available to fight such an attack.
Further, such an attitude would be contrary to the whole purpose
of having an open, research-oriented network. The Worm was caused
by a breakdown of ethics as well as lapses in security\320a
purely technological attempt at prevention will not address the
full problem, and may just cause new difficulties.
Acknowledgments
Much of this analysis was performed on reverse-engineered
versions of the worm code. The following people were involved in
the production of those versions: Donald J. Becker of Harris
Corporation, Keith Bostic of Berkeley, Donn Seeley of the
University of Utah, Chris Torek of the University of Maryland,
Dave Pare of FX Development, and the team at MIT: Mark W. Eichin,
Stanley R. Zanarotti, Bill Sommerfeld, Ted Y. Ts'o, Jon Rochlis,
Ken Raeburn, Hal Birkeland and John T. Kohl. A disassembled
version of the worm code was provided at Purdue by staff of the
Purdue University Computing Center, Rich Kulawiec in particular.
Thanks to the individuals who reviewed early drafts of this paper
and contributed their advice and expertise: Don Becker, Kathy
Heaphy, Brian Kantor, R. J. Martin, Richard DeMillo, and
especially Keith Bostic and Steve Bellovin. My thanks to all
these individuals. My thanks and apologies to anyone who should
have been credited and was not.
References
Allm83. Allman, Eric,
Sendmail\320An Internetwork Mail Router,
University of California, Berkeley, 1983. Issued with the BSD
UNIX documentation set.
Brun75. Brunner, John, The Shockwave Rider, Harper & Row, 1975.
Cohe84. Cohen, Fred, ``Computer Viruses: Theory and
Experiments,'' PROCEEDINGS OF THE 7TH NATIONAL COMPUTER SECURITY
CONFERENCE, pp. 240-263, 1984.
Denn88. Denning, Peter J., ``Computer Viruses,'' AMERICAN
SCIENTIST, vol. 76, pp. 236-238, May-June 1988.
Dewd85. Dewdney, A. K., ``A Core War Bestiary of viruses, worms,
and other threats to computer memories,'' SCIENTIFIC AMERICAN,
vol. 252, no. 3, pp. 14-23, May 1985.
Gerr72. Gerrold, David, When Harlie Was One, Ballentine Books,
1972. The first edition.
Gram84. Grampp, Fred. T. and Robert H. Morris, ``UNIX Operating
System Security,''
AT&T BELL LABORATORIES TECHNICAL JOURNAL, vol. 63, no. 8, part 2,
pp. 1649-1672, Oct. 1984.
Harr77. Harrenstien, K., ``Name/Finger,'' RFC 742, SRI Network
Information Center, December 1977.
Morr79. Morris, Robert and Ken Thompson, ``UNIX Password
Security,'' COMMUNICATIONS OF THE ACM, vol. 22, no. 11, pp. 594-
597, ACM, November 1979.
Post82. Postel, Jonathan B., ``Simple Mail Transfer Protocol,''
RFC 821, SRI Network Information Center, August 1982.
Reid87. Reid, Brian, ``Reflections on Some Recent Widespread
Computer Breakins,'' COMMUNICATIONS OF THE ACM, vol. 30, no. 2,
pp. 103-105, ACM, February 1987.
Ritc79.Ritchie, Dennis M., ``On the Security of UNIX, '' in U nt
2 def IX nt 0 def
SUPPLEMENTARY DOCUMENTS, AT & T, 1979. Seel88. Seeley, Donn, ``A
Tour of the Worm,'' TECHNICAL REPORT, Computer Science Dept.,
University of Utah, November 1988. Unpublished report.
Shoc82. Shoch, John F. and Jon A. Hupp, ``The Worm Programs \320
Early Experience with a Distributed Computation,'' COMMUNICATIONS
OF THE ACM, vol. 25, no. 3, pp. 172-180, ACM, March 1982.
Appendix A The Dictionary
What follows is the mini-dictionary of words contained in the
worm. These were tried when attempting to break user passwords.
Looking through this list is, in some sense revealing, but
actually raises a significant question: how was this list chosen?
The assumption has been expressed by many people that this list
represents words commonly used as passwords; this seems unlikely.
Common choices for passwords usually include fantasy characters,
but this list contains none of the likely choices \(e.g.,
``hobbit,'' ``dwarf,'' ``gandalf,'' ``skywalker,'' ``conan''\).
Names of relatives and friends are often used, and we see women's
names like ``jessica,'' ``caroline,'' and ``edwina,'' but no
instance of the common names ``jennifer'' or ``kathy.'' Further,
there are almost no men's names such as ``thomas'' or either of
``stephen'' or ``steven'' \(or ``eugene''!\). Additionally, none
of these have the initial letters capitalized, although that is
often how they are used in passwords. Also of interest, there are
no obscene words in this dictionary, yet many reports of
concerted password cracking experiments have revealed that there
are a significant number of users who use such words \(or
phrases\) as passwords. The list contains at least one incorrect
spelling: ``commrades'' instead of ``comrades''; I also believe
that ``markus'' is a misspelling of ``marcus.'' Some of the words
do not appear in standard dictionaries and are non-English names:
``jixian,'' ``vasant,'' ``puneet,'' etc. There are also some
unusual words in this list that I would not expect to be
considered common: ``anthropogenic,'' ``imbroglio,'' ``umesh,''
``rochester,'' ``fungible,'' ``cerulean,'' etc. I imagine that
this list was derived from some data gathering with a limited set
of passwords, probably in some known \(to the author\) computing
environment. That is, some dictionary-based or brute-force attack
was used to crack a selection of a few hundred passwords taken
>From a small set of machines. Other approaches to gathering
passwords could also have been used\320Ethernet monitors, Trojan
Horse login programs, etc. However they may have been cracked,
the ones that were broken would then have been added to this
dictionary. Interestingly enough, many of these words are not in
the standard on-line dictionary \(in /usr/dict/words\). As such,
these words are useful as a supplement to the main dictionary-
based attack the worm used as strategy #4, but I would suspect
them to be of limited use before that time. This unusual
composition might be useful in the determination of the
author\(s\) of this code. One approach would be to find a system
with a user or local dictionary containing these words. Another
would be to find some system\(s\) where a significant quantity of
passwords could be broken with this list. aaa academia aerobics
airplane albany albatross albert alex alexander algebra aliases
alphabet ama amorphous analog anchor andromache animals answer
anthropogenic anvils anything aria ariadne arrow arthur athena
atmosphere aztecs azure bacchus bailey banana bananas bandit
banks barber baritone bass bassoon batman beater beauty beethoven
beloved benz beowulf berkeley berliner beryl beverly bicameral
bob brenda brian bridget broadway bumbling burgess campanile
cantor cardinal carmen carolina caroline cascades castle cat
cayuga celtics cerulean change charles charming charon chester
cigar classic clusters coffee coke collins commrades computer
condo cookie cooper cornelius couscous creation creosote cretin
daemon dancer daniel danny dave december defoe deluge desperate
develop dieter digital discovery disney dog drought duncan eager
easier edges edinburgh edwin edwina egghead eiderdown eileen
einstein elephant elizabeth ellenemeraldengine engineer
enterprise enzyme ersatz establish estate euclid evelyn extension
fairway felicia fender fermat fidelity finite fishers flakes
float flower flowers foolproof football foresight format forsythe
fourier fred friend frighten fun fungible gabriel gardner
garfield gauss george gertrude ginger glacier gnu golfer gorgeous
gorges gosling gouge graham gryphon guestguitargumption guntis
hacker hamlet handily happening harmony harold harvey hebrides
heinlein hello help herbert hiawatha hibernia honey horse horus
hutchins imbroglio imperial include ingres inna innocuous
irishman isis japan jessica jester jixian johnny joseph joshua
judith juggle julia kathleen kermit kernel kirkland knight ladle
lambda lamination larkin larry lazaruslebesguelee leland leroy
lewis light lisa louis lynne macintosh mack maggot magic malcolm
mark markus marty marvin master maurice mellon merlin mets
michael michelle mike minimum minsky moguls moose morley mozart
nancy napoleon nepenthe ness network newton next noxious
nutrition nyquist oceanography ocelot olivetti olivia oracle orca
orwell osirisoutlawoxford pacific painless pakistan pam papers
password patricia penguin peoria percolate persimmon persona pete
peter philip phoenix pierre pizza plover plymouth polynomial
pondering pork poster praise precious prelude prince princeton
protect protozoa pumpkin puneet puppet rabbit rachmaninoff
rainbow raindrop raleigh random rascal really rebecca remote rick
ripple robotics rochesterrolexromano ronald rosebud rosemary
roses ruben rules ruth sal saxon scamper scheme scott scotty
secret sensor serenity sharks sharon sheffield sheldon shiva
shivers shuttle signature simon simple singer single smile smiles
smooch smother snatch snoopy soap socrates sossina sparrows spit
spring springer squires strangle stratford stuttgart subway
success summer supersuperstage support supported surfer suzanne
swearer symmetry tangerine tape target tarragon taylor telephone
temptation thailand tiger toggle tomato topography tortoise
toyota trails trivial trombone tubas tuttle umesh unhappy unicorn
unknown urchin utility vasant vertigo vicky village virginia
warren water weenie whatnot whiting whitney will william
williamsburg willie winston wisconsinwizardwombat woodwind
wormwood yacov yang yellowstone yosemite zap zimmerman
Appendix B The Vector Program
The worm was brought over to each machine it infected via the
actions of a small program I call the vector program. Other
individuals have been referring to this as the grappling hook
program. Some people have referred to it as the program, since
that is the suffix used on each copy. The source for this program
would be transferred to the victim machine using one of the
methods discussed in the paper. It would then be compiled and
invoked on the victim machine with three command line arguments:
the canonical IP address of the infecting machine, the number of
the TCP port to connect to on that machine to get copies of the
main worm files, and a magic number that effectively acted as a
one-time-challenge password. If the ``server'' worm on the remote
host and port did not receive the same magic number back before
starting the transfer, it would immedi- ately disconnect from the
vector program. This can only have been to prevent some- one from
attempting to ``capture'' the binary files by spoofing a worm
``server.'' This code also goes to some effort to hide itself,
both by zeroing out the argu- ment vector, and by immediately
forking a copy of itself. If a failure occurred in transferring a
file, the code deleted all files it had already transferred, then
it exited. One other key item to note in this code is that the
vector was designed to be able to transfer up to 20 files; it was
used with only three. This can only make one wonder if a more
extensive version of the worm was planned for a later date, and
if that version might have carried with it other command files,
password data, or possibly local virus or trojan horse programs.
<>
References:
BSD is an acronym for Berkeley Software Distribution.
UNIX is a registered trademark of AT&T Laboratories.
VAX is a trademark of Digital Equipment Corporation.
The second edition of the book, just published, has been
``updated'' to omit this subplot about VIRUS.
%%Page: 4 5
5
It is probably a coincidence that the Internet Worm was loosed on
November 2, the eve of this ``birthday.''
6
Note that a widely used alternative to sendmail, MMDF, is also
viewed as too complex and large by many users. Further, it is
not perceived to be as flexible as sendmail if it is necessary
to establish special addressing and handling rules when bridging
heterogeneous networks.
7
Strictly speaking, the password is not encrypted. A block of zero
bits is repeatedly encrypted using the user password, and the
results of this encryption is what is saved. See [Morr79] for
more details.
8
Such a list would likely include all words in the dictionary, the
reverse of all such words, and a large collection of proper
names.
8
rexec is a remote command execution service. It requires that a
username/password combination be supplied as part of the
request.
9
This was compiled in as port number 23357, on host 127.0.0.1 \(loopback\).
10
Using TCP port 11357 on host 128.32.137.13.
11
Interestingly, although the program was coded to get the address
of the host on the remote end of point-to-point links, no use
seems to have been made of that information.
12
As if some of them aren't suspicious enough!
13
This appears to be a bug. The probable assumption was that the
routine hl would handle infection of local hosts, but hl calls
this routine! Thus, local hosts were never infected via this
route.
14
This is puzzling. The appropriate file to scan for equivalent
hosts would have been the .rhosts file, not the .forward file.
15
Private communication from someone present at the meeting.
16
The thought of a Sequent Symmetry or Gould NP1 infected with
multiple copies of the worm presents an awesome \(and awful\)
thought. The effects noticed locally when the worm broke into a
mostly unloaded VAX 8800 were spectacular. The effects on a
machine with one or two orders of magnitude more capacity is a
frightening thought.
17
Developed by Kevin Braunsdorf and Rich Kulawiec at Purdue PUCC.
18
Rick Adams, of the Center for Seismic Studies, has commented that
we may someday hear that the worm was loosed to impress Jodie
Foster. Without further information, this is as valid a
speculation as any other, and should raise further disturbing
questions; not everyone with access to computers is rational and
sane, and future attacks may reflect this.
19
Throughout this paper I have been writing author\(s\) instead of
author. It occurs to me that most of the mail, Usenet postings,
and media coverage of this incident have assumed that it was
author \(singular\). Are we so unaccustomed to working together
on programs that this is our natural inclination? Or is it that
we find it hard to believe that more than one individual could
have such poor judgement? I also noted that most of people I
spoke with seemed to assume that the worm author was male. I
leave it to others to speculate on the value, if any, of these
observations.