Tuesday, July 15, 2008

Linux Memory Forensics

A collaboration with the PyFlag team, Michael Cohen and David Collett.

One of the major components of the DFRWS 2008 challenge was to improve the state of Linux memory forensics techniques and to develop tools that are applicable to a broad range of systems and forensic challenges that an investigator may face. In this section, we will discuss the efforts that we have made in order to address those objectives. Our goal was to make a variety of new tools and techniques available to investigators and demonstrate how they can be used to help investigate the memory sample provided as part of the challenge (challenge.mem). At the end of this section, we will also address how the information extracted from RAM can be leveraged in the second major component of the challenge, the fusion of memory, hard disk, and network data.

Previous research has demonstrated that memory forensics is often an important component of the digital investigation process [cite]. Memory forensics offers the investigator the ability to access the runtime state of the system and has a number of advantages over traditional live response techniques, typically used by forensic toolkits [cite]. While there has been some previous research into Linux memory forensics, the majority of the recent work has focused primarily on Windows memory analysis.

In 2004, Michael Ford demonstrated how an investigator could use many of the preexisting tools used for crash dump collection and analysis to help perform analysis in the wake of an incident[cite]. In particular, he described how the the "crash" utility can be used to investigate a crash dump collected from a compromised system. While "crash" proved a valuable tool for analyzing crash dumps, the author is forced to rely on "crude" techniques for analyzing memory samples that were not collected in a crash supported format (ie linear mapping of physical memory). Also in 2004, Mariusz Burdach describes collecting a sample of physical memory through from the /proc pseudo-filesystem and it's kcore file[cite]. He began by performing basic analysis (grep, strings and hex editors) to look for interesting strings and he then discussed advanced analysis that could be performed by painstakingly using gdb to analyze the system call table and list running processes. In 2005, Sam Stover and Matt Dickerson used a string searching method to find malware strings in the memory sample collected from /proc/kcore on a Linux system [cite]. Later in 2005, Burdach extended this research by releasing the idetect tools for the 2.4 kernel, which aided in extracting file content from memory and listing user processes[cite]. In 2006, the FATKit project described generic architecture to effectively deal with memory forensics abstractions allowing support for both Linux and Windows analysis, as demonstrated in the example modules[cite]. In 2006, Urrea also described techniques for enumerating processes and manually rebuilding a file from memory[cite].

As we can see in each of these previous examples, debugging tools and their supporting information (ie Symbols) have played an important part of Linux memory forensics. As a result, we felt it was important to leverage as much of the previous work and experience with Linux kernel debugging as possible. Thus our first contribution with respect to this challenge was to create a patch for the Red Hat crash utility, which is maintained by David Anderson. This is the same utility that was originally discussed by Ford, but now we have modified it so that it can analyze a linear sampling of physical memory, as in the case of the challenge.mem sample distributed with the challenge.

Red Hat Crash Utility

The Red Hat Crash Utility combines the kernel awareness of the UNIX crash utility with the source code debugging abilities of gdb. It is also has the ablility to analyze over 14 different memory sample formats. Another advantage of crash is that it has support for a number of different architectures (x86, x86_64, ia64, ppc64, s390 and s390x) and versions of Linux (Red Hat 6.0 (Linux version 2.2.5-15), up to Red Hat Enterprise Linux 5 (Linux version 2.6.18+)). Thus it really does address the need to have a broad applicability. Our patch for crash can be found at the following following url:

Once the patch has been applied (patch -p1 <volcrash-4.0-6.3_patch) and the source code built (make), you will also want to obtain the mapfile and namelist (a vmlinux kernel object file) for the DFRWS memory sample.

In order to process a linear sampling of memory, you will need to pass the --volatile command line option as seen in the following example:

In this section, we will discuss how we can use the crash commands to help extract artifacts from the memory sample found in the challenge. Upon successful invocation, crash will present information about the system whose memory was sampled. For the image in the challege, the output will look like this.

From this information, we can see that the sample was taken on Sun Dec 16 23:33:42 2007 and the machine had been running for 00:56:51. It also gives us a lot other interesting information from the image such as the amount of memory, the number of processors, etc. Our patch sets the current context to the Linux task with the PID of 0. As seen, in the output this is the PID for the "swapper" task. If necessary, this context can be changed using the "set" command. Information about available commands can be found through the "help" command. In the following sections we will demonstrate the type of information that can be extracted using crash. In particular, we will primarily focus on those things germane to the challenge.

Processes

Listing tasks is often one of the first things people want to do to see what is actually running on the system. By issuing this command, the investigator will receive information about process status similar to the Linux ps command:

From this output we can extract information about the processes that were active on the box when the sample was collected. The ps command also has a number of useful command line options. For example, the investigator may want display a processes parental hierarchy to determine how it was invoked (-p). As seen in the following output, the -t option can also be used to display the run times, start times, cumulative user and system times for the tasks. This information can be extremely useful as part of time line analysis and for determining the temporal relationships between events that occurred on the system.

Using the -a option we are able to discern the command line arguments and environment strings for each of the user-mode tasks. This maybe particularly useful when encountering an unknown process in memory or determining how an suspicious executable was invoked. This can also be helpful for mapping a process and it's associated UID back to the user when the /etc/passwd file is not available. For example, by leveraging the environment strings we can determine that the bash process (PID: 2585) was started by user stevev.

We are also able to extract the open files associated with the context of each task. Beyond presenting information associated with each of the open descriptors, it also prints current root directory and the working directory for each of those contexts. This can often provide valuable leads when dealing with the large volume of evidence associated with modern investigations.

We can also extract information about each tasks open sockets. This can be useful to determine if there are any open connections with other systems that need to be investigated further. It will also show if the systems is listening on any ports which may have been points of entry or backdoors left behind. We can see that in the case of the challenge memory sample there aren't any open connections but the dhclient process (PID: 1565) has a socket with source port 68 and sendmail process (PID: 1872) has a socket with source port 25.

There are a couple of things to note from the previous output information. First from the swap information we can see that the load on the system is not causing pages to be swapped out. Second by leveraging the data in the kernel message buffer we can get an indication of when the system was booted. For example, by looking at the audit(1197861235.541:1): initialized boot message which has a unix timestamp of 2007-12-16 22:14:01.

This was just a sample of the type of information that is available through the default command set that comes with crash. Another benefit associated with leveraging the Red Hat Crash Utility is that the command set can be extended through loading shared libraries. In the following section, we will discuss an extension module that will allow us to use Python scripts to interface with crash.

PyKdump Framework (Python scripting for crash)

PyKdump, written by Alexandre Sidorenko, embeds a Python interpreter as a dynamically loadable 'crash' extension so you can create Python scripts to help perform analysis. In the following sections, we will show how PyKdump can help extract information from the challenge memory sample.

PyKdump includes a program called xportshow which can be used to extract a lot of useful network related information beyond what is available in the crash default command set. PyKdump and the xportshow program can also be used to extract important information from the challenge sample.

One of the first things we can do is extract detailed information about system's available interfaces. This allows us to extract information similar to that provided by the Linux command "ifconfig". This is useful for extracting the state of those interfaces including the times since they may have transmitted or received packets and whether the interface is in promiscuous mode or not. From this we can also confirm that the IP address of the eth0 interface is 192.168.151.130 which can help as we analyze the pcap data. crash> xportshow -ivoutput

Using xportshow, we can also extract information from the internal ARP cache. This can be useful to determine other systems that may need to be investigated or to determine if the ARP cache has been manipulated in any way.

While on the topic of layer 3 routing, we can also use xportshow to extract the route cache also known as the forwarding information base (FIB) on Linux. This stores recently used routing entries and is consulted before going to the routing table. Thus we can use this information to determine other machines the system was communicating with and look for signs manipulation. For example the route cache for the challenge image shows that our suspected system (192.168.151.130) previous communicated with the following addresses: 219.93.175.67, 86.64.162.35, 192.168.151.2,192.168.151.254. The 219.93.175.67 address corresponds to the address where the zip files was being exfiltrated.

Now continuing to move up the stack we can also use xportshow to once again extract all the open sockets. As seen in the following results, xportshow presents this information in a format similar to netstat. This is extremely useful for determining both active network connections or listening services. It also provides a number of command line arguments for filtering the output.

PyKdump also has provides a crashinfo program that can print the systems runtime parameters (sysctl), file locks, and stack summaries.

As you can see, our patch now allows us to leverage both the Red Hat Crash Utility and PyKdump to extract a lot of valuable information from the memory sample in the challenge. The goal of our further development efforts were to leverage the power of these tools while developing new tools and techniques that are applicable to an even broader range of systems and forensic challenges than just the debugging Linux systems. The following sections will describe how we addressed those goals using Volatility, the open source volatile memory artifact extraction utility framework. We will also discuss how we are adding support to Volatility that will allow you to run your PyKdump commands transparently, even while working on a Windows host. By leveraging Volatility, our efforts for combining multiple data sources will not be limited to a particular operating system.

Volatility

Volatility is an open source modular framework written in Python for extracting digital artifacts from acquired samples of volatile system memory. From it's inception it was designed to be a modular and extensible framework for analyzing samples of volatile memory taken from a variety of operating systems and hardware platforms. The Volatility Framework builds upon research we performed on both VolaTools and FATKit. While previous versions of the framework focused on the analysis of Windows XP SP2 samples, as a part of this challenge we will demonstrate how it can be easily adapted to other operating systems as well (i.e. Linux). This challenge also allowed us to make use of the powerful new features which were added to Volatility 1.3.The power of Volatility is derived from how it handles the abstractions of volatile memory analysis within it's software architecture. This architecture is divided into three major component: Address Spaces, Objects and Profiles, and Data View Modules.

Address Spaces

Address spaces are intended to simulate random access to a linear set of data. Thus each address space must provide both a read function and a function to test whether a requested region is accessible. It is through the use of address spaces that Volatility is able to provide support for a variety of file formats and processor architectures. These address spaces are also designed to be stackable while maintaining the ability to have concurrent handles to the same data through different transformations. In order to analyze the challenge.mem sample, we make use of both the FileAddressSpace and the IA-32 paged virtual address space, IA32PagedMemory, that are also used for Windows memory analysis.

Objects and Profiles

Objects refer to any data that is found within an address space at a particular offset. The new object model included in 1.3, which was used in the software for this challenge, supports many of the semantics of the C programming language. Volatility uses profiles to define those object formats. When analyzing an Linux sample, the profile can be automatically generated from the source code or debugging information. For the challenge we will be using a profile generated for the 2.6.18-8.1.15.el5 kernel. We also include the System.map as a component of the profile as well.

Data View Modules

Data view modules provide algorithms to find where the data is located. These are the methods used to collect data or objects from the sample. For this challenge we created 11 new data view modules to facilitate analysis of Linux samples. The following sections will describe each of the new modules that was created. These new modules were also built for the new pluggable architecture included in Volatility 1.3. This allows new modules to be added without requiring any changes to source code.

Strings

As we mentioned previously, one of the most common forms of analysis performed on a sample of physical memory is to look for sequences of printable characters extracted using the "strings" command. Thus it is here that we will begin our discussion of analyzing memory using Volatility. One of the major limitations with relying on this method of analysis alone is that it is a context free search. Thus it simply treats the sample of memory as a big block of data. For example, while reviewing the strings from this image we are able to find strings related to bash command history resident in memory. From these commands we can see that someone on the system attempted to copy Excel spreadsheets and Pcap files from an admin share (/mnt/hgfs/Admin_share) to a temp file. At some other point they attempted to discover if a vulnerable version of the X windows system was running on the system. They then proceeded to download and execute a privelege escalation exploit from the metasploit project intended to gain root privileges.In an attempt to add more context to these types of strings we created a module called linstrings which provided the equivalent functionality to Volatility's string command. This allows us to map the strings extracted from the memory sample back to the corresponding virtual address and associated process. This mapping is accomplished by walking the address translation tables and determining which processes has the ability to access the physical page where the string is located. In the Linux version we only consider the user land address space.

The ability to map these strings back to their respective processes is extremely useful. We can can see that all the strings in the previous table were addressable by processes with a UID of 501, which is the UID for user stevev, Steve Vogon.

The linident module is used to provide valuable information about the system the memory sample was acquired from. This module provides similar information to the crash sys command but it has been augmented to include timezone information, which we have found useful during temporal reconstruction. As seen the the following output the local timezone for the system was GMT-5. It also provides the GMTDATE corresponding to when the sample was acquired. The current time and timezone information can also be obtained from the lindatetime module as well.

We also provide a module that will extract what processes were running on the system when the sample was acquired. We have augmented this to also include the UID of the process owner. By combining this with the strings to process mapping provided by linstrings we are able to attribute those strings to a particular user. For example by correlating with the environment information previously discussed ( or if /etc/passwd was available) we know any process with UID 501 can be attributed to user stevev. We also know that any strings mapping to those process are related to that user as well.

We have also included linpsscan which makes use of the Volatility scanning framework. Unlike linps which traverses the operating system data structures to find the processes that were running on the system, this modules performs a linear scan of the physical memory sample while searching for task_struct objects which it treats as a constrained data item. These contraints were automatically developed by sampling valid task_structs from the memory sample. The benefits associated with this technique can be seen in the fact that the previous module, linps, was only able to enumerate 89, while linpsscan found 10 more with the physical address space. We have also included the UID so each task_struct could be mapped back to a user.

We also created a module called linmemdmp. This module automatically rebuilds the address space for a specified process and dumps its entire addressable memory to a file for further analysis. This can be extremely useful if you are attempting a brute force encryption keys (ie SSL) or you want to add some context to your string searches. The process to be dumped can be specified by either a PID (-P) or task_struct physical memory offset (-o) depending on whether it was discovered with linps, or linpsscan, respectively..

We also created a linpktscan module that performs a linear scan of the sample of physical memory looking for memory resident network packets. This module makes use of the Volatility generic scanning framework to describe network packets as constrained data items. The current implementation constrains the sought after data to either UDP or TCP packets with a header of minimum length that has a valid IP header checksum. Another nice feature of this module is that it also allows the investigator to extract those packets from memory and write them to a pcap file that can then be imported into their favorite packet analysis tool (ie Wireshark).

Using this module on the memory sample provided in the challenge, we were able to see that the system had recently communicated with the following IP addresses over http: 219.93.175.67, 198.105.193.114. Of particular interest are the packets being sent to the 219.93.175.67 address. This was the address where the zip file was exfiltrated using http cookies. By using linpktscan we are able to find and extract memory resident packets with cookies containing parts of the exfiltrated data. Thus we are able to connect the data in the pcap files back to the system.

On another interesting note, we are also able to extract FTP packets flowing between 10.2.0.2 and 10.2.0.1. These memory resident packets were part of the ftp.pcap file that was exfiltrated. Thus we know that at some point this file was loaded into memory on the system.

linvm

This module will display the virtual memory mappings for each process. This provides information analogous to that typically found by the maps file in the /proc entry for the process. This can be extremely useful for determining which files maybe memory mapped by a process and where they can be found within memory. This can be extremely helpful for determining how the address space is being used. We have also augmented the output to include information about the code, data, and stack regions of a processes virtual address space in case an investigator wants to extract them from memory as well.

We have also included a module linsockets which can be used to extract information about each task's open sockets. As previously mentioned this can be useful for determining if there are any open connections with other systems or if the system is listening on any unexpected ports and if so which process is responsible.

We are also able to extract the open files associated with the context of each task. As previously mentioned, this can often provide valuable leads to target files or directories of interest when dealing with large disk images.python volatility linfiles -f challenge.mem -p profiles/2_6_18-8_1_15_el5/centos-2.6.18-8.1.15.el5.types.py -s profiles/2_6_18-8_1_15_el5/System.map-2.6.18-8.1.15.el5output

linmodules

The final module which we have included is linmodules. This will print basic information about the currently loaded kernel modules. This allows an investigator to determine if anyone may have attempted to load a kernel module to dynamically change the behavior of the kernel.

As you can see, Volatility provides a powerful software architecture that allows it to be easily adapted to whatever type of hardware or operating system the investigator needs to analyze. It also provides extremely useful APIs and libraries that allow investigators quickly create new modules to support their investigations and to easily share those modules with colleagues. We are also finishing up code that will allow you to run PyKdump scripts transparently within both crash or Volatility. Another advantage of Volatility is that it allows allows the analyst perform their investigations on any operating system which supports Python. Thus we believe that Volatility allows us to achieve our goals of leveraging previous work in kernel debugging while being applicable to a broad range of systems. Finally, Volatility is currently integrated into a number of analysis frameworks including both PTK and PyFlag.

PyFlag Memory Analysis

PyFlag has officially supported memory forensics since it's integration of Volatility in January of 2008. Thus allowing an investigator to correlate disk images, log files, network traffic, and memory samples all within an intuitive interface. It was also the first framework to support analysis of memory samples stored in either EWF or AFF formats. In this section, we will discuss how with the upcoming release of Volatility 1.3 this integration has been extend so that PyFlag now has the ability to support the analysis of both Linux and Windows memory samples. This functionality will be briefly discussed on the memory sample included in the challenge.

In order to analyze a memory sample with PyFlag, the sample must be loaded. This is accomplished by choosing the Load IO Data Source menu item found under the Load Data Tab at the top of the screen. At the Load IO Data Source page set the "Select IO Subsystem" to standard and leave the "Evidence Timezone" to SYSTEM. At the next "Load IO Data Source" page once again set the "Select IO Subsystem" to Standard and leave the "Evidence Timezone" to SYSTEM. Depending on whether your image can be found on disk or the Virtual File System, click either the finger pointing to the folder or the VFS files, respectively. Having already loaded the evidence file, we will search for the memory sample within the VFS. Thus we click on the VFS folder. At this point we will be presented with a table listing all the files in the VFS. In order to find the file we are looking for, challenge.mem, we click the funnel in the upper left hand corner of the window which will allow us to filter the table. At the Filter Table pop-up screen type, ("Filename" contains challenge) into the Search Query dialog box. Then, click the submit button at the bottom. At this point you should see the challenge sample in the table. After choosing the sample you will be returned to the Load IO Data Source page. Fill in the "Enter partition offset:" box with a zero and the "Unique Data Load ID" box with "mem".

Now click the Submit button in the lower left had corner of the windows. You will now be brought to the "Load Filesystem image" screen. Verify that the "Case" value is set to dfrws and the "Select IO Data Source" matches the value you entered for "Unique Data Load ID" on the previous screen. Then set the "Enter Filesystem type" drop down to "Linux Memory" and choose a mount point (ie memmnt) for the "VFS Mount Point" entry box.

Next click the Submit button at the lower left hand of the screen. At this point you will be prompted to choose a Volatility profile. Select the 2_6_18-8_1_15_el5 "Profile" from the drop down menu and click the Submit button again. Next you will be presented with another drop down menu to select a "Symbol Map". Select System.map-2.6.18-8.1.15.el5.map from the drop downs.

Once the System Map is selected, select the Submit button on the lower left had corner again. At this point it will begin to load the sample into the system. When it is finished loading you will return to the "Browsing Virtual Filesystem" windows and your sample will be mounted at the specified "VFS Mount Point", which in our example is memmnt. Now that our memory sample has been loaded you can access the data through a browseable /proc interface or through the "Memory Forensics" menu item at the top of the screen.

On the other hand, you can also access the data through the "Memory Forensics" menu item at the top of the page as seen in the following image.

By clicking on a linked address it will automatically perform the address translation and take you to the correct offset within the physical address space. As we previously mentioned we can also run PyFlags award winning collection of carvers against the loaded sample.