Appendix B: Data Gathering and the Order of Volatility

B.1 Introduction

In 1999 we wrote that forensic computing was "gathering and analyzing data
in a manner as free from distortion or bias as possible to reconstruct
data or what has happened in the past on a system." Trusting your tools
and data once you have them is problematic enough (Chapter 5 - "Systems
and subversion" - talks about this at length), but there is an even
greater problem. Due to the Heisenberg principle of data gathering and
system analysis (see section 1.4) even with the right and trusted tools
you cannot get all the data on a computer, so where should you start?
This appendix gives an overview of how to gather data on a computer as
well as the problems caused - mainly by the Order of Volatility, or OOV.

B.2 The basics of volatility

As we have demonstrated throughout the book, computers store a great
amount of information in a significant number of locations and layers.
Disk storage and RAM are the two most commonly thought data repositories,
but there are a great number of places - even outside the system if it
is connected to a network - that useful data can hide.

All data is volatile, however. As time passes the veracity of the
information goes down, and the ability to recall or validate the data also
decreases. When looking at stored information it is extremely difficult
to verify that it has not been subverted or changed.

That said, certain types of data are generally more persistent, or
long-lasting, than others. Backup tapes, for instance, can typically be
counted upon to remain unchanged longer than things in RAM - it is less
volatile. We call this hierarchy the order of volatility. At the
top you have some pieces of information becoming virtually impossible
to recover within a very short time - sometimes nanoseconds (or less)
of their inception date - such as data in CPU registers, frame buffers,
etc. At the bottom of the order you have things that are very persistent
and hard to change, such as stone tablets, printouts, and other ways of
imprinting data into a semi-permanent medium.

So in most cases you simply try to capture data while keeping this
order in mind - the more rapidly changing information should almost
always be preserved first. Table B.1, also seen in the first chapter,
gives a rough guide to the life expectancy of data.

Registers, peripheral memory, caches, etc.

nanoseconds

Main Memory

nanoseconds

Network state

milliseconds

Running processes

seconds

Disk

minutes

Floppies, backup media, etc.

years

CD-ROMs, printouts, etc.

tens of years

Table B.1: The expected lifespan of data.

Information lost from one layer may still be found in a lower layer (see
sections 1.5, "Layers and illusions" and 1.7, "Fossilization of deleted
information" for more about this.) But the point of OOV is the opposite:
doing something in one layer destroys information in all layers above it.
Simply executing a command to retrieve information destroys the contents
of registers, MMUs, physical memory and time stamps in the file system.

And if the act of starting up a program to read or capture the memory of
a system has the potential to cause a loss of existing data in memory
as the kernel allocates memory to run the program that performs the
examination in the first place. So what can you do?

B.3 Current state of the art

With our additional work on forensics we have decided to put
less emphasis on
the "in
a manner as free from distortion or bias as possible" from our definition
in the introduction to this appendix. We believe that by what can be seen
as risking digital evidence you are more likely to get additional data
and a better chance of addressing and understanding the problem at hand.

This goes against traditional forensic computing wisdom, which uses very
conservative methods - rarely more than turning off a computer and making
a copy of a system's disk [US DOJ, 2004]. Certainly if the goal is to
ensure that the data being collected is optimized for its admissibility
in a court of law and you've only got one shot at capturing it, a very
cautious methodology can be the best thing in some cases.

Unfortunately this misses a wealth of potentially useful information about
the situation, such as running processes and kernels, network state, data
in RAM, and more. Only a limited understanding can arise from looking
at a dead disk. And while perhaps dynamic information is a bit more
volatile and therefore suspect, any convictions based of a single
set of data readings are suspect as well. Certainly, as we've seen
throughout the book, no single data point should be trusted anyway.
Only by correlating data from many points can you begin to get a good
understanding of what happened, and there is a lot of data to look at
out there. It would be a pity to throw it away.

Gathering data according to the OOV helps in general preserve rather than
destroy, but unless computer architectures changes significantly there is
no single best way to capture digital evidence. For instance RAM might
be the first thing you'd like to save, but if you're in a remote site and
you have no local disk it could take hours to transfer it to a safe disk
somewhere else, and by the time you're done much of the anonymous memory
(the most ephemeral part - see chapter 8) could be long gone.

Certainly the current set of software tools to capture evidence is not
terribly compelling. Our own TCT, while at times useful, could be much
improved upon. Other packages, most notably SleuthKit [Carrier 2004],
and Encase [Encase 2004] are worthy efforts but still have far to go.
It's too bad that we have not progressed far beyond the erstwhile
dd copying program, but automated capture and analysis is
very difficult.

B.4 How to freeze a computer

The spirit of Darryl Zero (see section 1.1) infuses our mindset - if
you're looking for anything in particular, you are lost. But if you
keep your mind and eyes open you can go far.

The reproducibility and provability of results is difficult when dealing
with the capture of very complex systems that are constantly in motion.
Starting states of computers will always be different, often with
significant changes in operating system, kernel, and software versions
that are too complex for anyone to fully understand.

So ideally you would want both raw and cooked (processed) data.
Having the process table from a FreeBSD computer is of limited worth
if you don't have native programs for analysis, so the output from
ps is still important. You you will also want to gather data
on the system while it is still running as well as at rest, in case they
return different results. Volume also becomes problematic - where do you
store all this data? It's one thing storing a personal workstation, but
what happens when trying to analyze a Petabyte or Exabyte class server?

A thorough discussion on how to gather and store digital evidence would
perhaps warrant a book unto itself, but we shall try to give some basic
guidelines.

Richard Saferstein [Saferstein, 2003] writes that when processing a crime
scene a basic methodology should be followed, one that we espouse when
dealing with a computer incident as well. Here are his first steps
to an investigation:

Secure and isolate.

Record the scene.

Conduct a systematic search for evidence.

Collect and package evidence.

Maintain a chain of custody.

It doesn't take much imagination to see how all of these apply to
computers.

Before you start

You should first consider how much time you would like to spend on
analyzing the data, as it is a time-consuming processes to both collect
and to process all the information. As an only slightly tongue-in-cheek
guide we offer table B.2 as a possible guide.

ACTION

EXPERTISE REQUIRED

TIME CONSUMED

Go back to work

None

Almost none

Minimal effort

Installing system software

1/2 - 1 day

Minimum Recommended

Jr. System Administrator

1-2 days+

Serious effort

Sr. SA

2+ days - weeks

Fanaticism

Expert SA

days - months+

Table B.2: Rough estimate of the cost of an investigation.

A tremendous amount of time can be consumed taking care of the problem
at hand, but as a rule of thumb if you don't expend at least a day or
two you're probably doing yourself and your system a disservice. One of
the more difficult things to judge is how much effort to put into the
analysis. Oftimes the more analytical sweat you emit the more clarity and
understanding you have of the situation. That said, some situations are
harder and at times an intruder is more careful and skilled than others,
and unfortunately you never know prior to the break-in what to expect.
The truth is you'll never absolutely, positively know that you've found
all that you can. Experience will be your only guide here.

You'll next need, at least, a pad of paper, something to write with,
and a secure, preferably off-line location to store information that
you collect. A second computer that could talk to the network would
be a welcome, but not necessary, aide. A laptop may be used to store
results, notes, and data, but be cautious about who has access to the
second system.

Even though installing or downloading programs will damage evidence it
is sometimes a necessity in order to process a computer. The negative
effects of this can be mitigated by having a personal collection of
forensic tools at the ready. But automation should be used at all costs,
unless it is completely unworkable.

If commercial packages are not an option such open source projects as
FIRE [FIRE, 2004], PLAC [PLAC 2004], and others based on the impressive
Knoppix Linux distribution [Knoppix, 2004] may be placed on CDs or other
portable media and used when an emergency occurs.

Actually collecting data

We apologize for our UNIX-centric advice, but the same
roughly holds for any operating system. The computer(s) in question
should be taken offline. There are some potential problems with this,
as the system might expect to be online, and this could destroy evidence
by generating errors, repeatedly retrying connections, or in general
causing the system to change. Alternately you might try cutting it off
at router and keep it on a LAN, but DNS and network services as well as
other systems in area can still cause problems.

As you proceed along you need to keep track of everything you type or do.
In general it's a "grab first, analyze later" situation, however.
Note the hardware, software, system, and network configurations that
are in place.

If you're serious about collecting the data, however, we might suggest
you capture data in the following order, which mirrors the OOV:

Device memory, if possible. Alas, few tools exist to
do this.

Main memory. Using the guidelines set in chapter 8, capture
RAM and store it offline.

Get the journal, if you're dealing with a journaling file system.
Section 2.8 shows how this can be done for Ext3fs file systems with
TCT's icat command.

Get all transient state that you can. TCT's Grave-robber can
aide you with this.

Capture information from and about the disk. If possible get
the entire disk; we cover this in chapter 4. If not, TCT's Grave-robber
can at least get the important parts.

There you go, you now have all the data that is fit to save. Now all
that remains is to analyze it....

B.5 Conclusion

Forensic computing does not often yield black and white decisions
or clear and unambiguous understandings of a situation in the past.
Like its counterpart in the physical realm of criminal investigations
the quality and integrity of the investigators and laboratories involved
are the most important thing. And the tools and methodologies used will
significantly changing for the better over time.

It's a field that perhaps should be taken with more gravity than most
of computer science. The FBI said that "Fifty percent of the cases the
FBI now opens involve a computer" [Williams, 2002] and the number is
surely rising. Programs and people involved in the gathering and
analysis of evidence must take special care, for people's freedoms, lives,
jobs, and more can and are seriously impacted by the results.

Nothing is certain, but while probabilistic forensics does have a negative
sound to it, it's what we now have. However, much has been written and
achieved in the short time this field has been around, and we fervently
hope that progress continues, because it will benefit us all. With best
luck to the future,