Monday, September 24, 2012

HowTo: Scan for Internet Cache/History and URLs

This post will describe how you can leverage the flexibility of the Volatility framework to locate IE history from Windows memory dumps. Such artifacts have traditionally not been a priority, because the data is in user-mode (i.e. index.dat mappings) and the structure format is already well understood and documented - thus there's not much challenge to the task. However, in the interest of helping others learn some memory analysis techniques, I'll go through a short tutorial of how to locate and parse Internet history both with and without dedicated plugins. If you need to verify a user's activity on a website or determine the source of a drive-by malware infection, this information may be useful to you.

The first thing to do is identify the Internet Explorer (iexplore.exe) process(es) in the memory dump. We'll discuss in more detail later, but it is very important to remember that IE is not the only application to map sections of index.dat into memory. Any tool using IE through a COM object or even malware using Wininet APIs (such as InternetOpenUrl, InternetReadFile, HttpSendRequest, etc) will alter the cache/history; thus they may have portions of the data in memory.

Now that we know the PIDs of IE processes (2580 and 3004), we can use the existing yarascan plugin to get an initial view of where index.dat file mappings may exist. Since the file's signature includes "Client UrlCache" that's a good starting point.

Now you know at least you're barking up the right tree. However, to simply find visited URLs and things of that nature, you don't need to parse the index.dat initial file header at all. For example, you can just scan for the individual cache records, which start with "URL ", "LEAK", or "REDR" (there's also a HASH tag but its not necessary for our goals). Feel free to combine the multiple strings into a regular expression so you only need to search once:

As explained in the file format documentation, at offset 0x34 of a "URL" or "LEAK" string, you can find a 4-byte number (68 00 00 00 in the above examples) which specifies the offset from the beginning of the string to the visited location (i.e. URL). For redirected URLs, the location can be found at offset 0x10 of the "REDR" string. So given that information, you've already started finding non-arbitrary URLs in memory (i.e. ones that in fact were related to the cache/history and not just a domain name floating around).

Designing a Plugin

While you've seen a quick and dirty way of locating sites in the IE history, there may be a need for different output formatting and better automation/parsing of the results. For example, instead of a hex dump, you might want a CSV file of visited URLs, with timestamps, the HTTP response data, and various other fields. For that, you'll need a dedicated plugin, but that can all be done in about 100 lines of code. To start, first define the record structures using Volatility's vtypes language, as shown below. It's all pretty basic, except we use a little trickery to automatically determine the location of the URLs within the structures (using the lambda functions which build an absolute address based on the structure's base address plus the 4-byte offset).

Then we'll build a plugin based on the Plugin Interface wiki page in the Volatility Developer Guide. The plugin's name will be iehistory and it will inherit from taskmods.DllList for access to the existing command-line arguments like --pid and --offset (for filtering or selecting specific processes). We'll also add two extra options (--leak and --redr) so that reporting deallocated and redirected records can be optional. The full plugin source code can be viewed in the 2.3-devel branch.

class IEHistory(taskmods.DllList):

"""Reconstruct Internet Explorer cache / history"""

def __init__(self, config, *args, **kwargs):

taskmods.DllList.__init__(self, config, *args, **kwargs)

config.add_option("LEAK", short_option = 'L',

default = False, action = 'store_true',

help = 'Find LEAK records (deleted)')

config.add_option("REDR", short_option = 'R',

default = False, action = 'store_true',

help = 'Find REDR records (redirected)')

Now for the all-important calculate function. This is where we do the majority of the work. Per usual, we'll acquire a kernel address space (using the Idle process DTB). We'll build a list of tags based on the selected command-line options and associate the tags with our _URL_RECORD and _REDR_RECORD structures. Please note that since "LEAK" structs are nearly identical to "URL " structs, we just alias the two and merge them. We use the _EPROCESS.search_process_memory() API and pass it the list of strings to find. For each hit (an address in process memory), we create the correct structure, check if its valid with some sanity checks, and then yield the process object and the record to the render function (not shown).

def calculate(self):

kernel_space = utils.load_as(self._config)

## Select the tags to scan for. Always find visited URLs,

## but make freed and redirected records optional.

tags = ["URL "]

if self._config.LEAK:

tags.append("LEAK")

if self._config.REDR:

tags.append("REDR")

## Define the record type based on the tag

tag_records = {

"URL " : "_URL_RECORD",

"LEAK" : "_URL_RECORD",

"REDR" : "_REDR_RECORD"}

## Enumerate processes based on the --pid and --offset

for proc in self.filter_tasks(tasks.pslist(kernel_space)):

## Acquire a process specific AS

ps_as = proc.get_process_address_space()

for hit in proc.search_process_memory(tags):

## Get a preview of the data to see what tag was detected

tag = ps_as.read(hit, 4)

## Create the appropriate object type based on the tag

record = obj.Object(tag_records[tag], offset = hit, vm = ps_as)

if record.is_valid():

yield proc, record

Using the IEHistory Plugin

The plugin has two rendering options. The default "text" mode will output blocks of data - one for each cache hit. For example:

Note: as explained in the libmsiecf index.dat format reference, the timestamps may be UTC or localtime depending on if the record is found in a global, weekly, or daily history file. The caveat to scanning for individual record tags is there's no backwards link with the containing history header, so you can't easily determine if UTC or localtime is correct. Currently we use UTC for everything and may determine a better fix sometime before the 2.3 release.

Malicious Code Example

I randomly choose one of Hogfly's public memory images (exemplar 17) to test the plugin against and was rather pleased with what I found. Remember earlier when we discussed the fact that processes other than IE will have parts of cache/history in memory? This certainly proves the point. Take a look, and notice we don't specify any --pid so the plugin scans the memory of all processes. You'll see hits in both explorer.exe (Windows Explorer) and a strange PID 1192 named 15103.exe.

Volatility can be as thorough as you want it to be. That said, there are a few situations we haven't discussed. For example, what if you're looking for URLs just in process memory (i.e. embedded in a web page but not yet visited, in JavaScript code, in a SWF string, etc)? IE history files are also known to have "slack space" where old records with long URLs may be overtaken by new records with smaller URLs, thus leaving part of the original domains in-tact. Furthermore, what about browsers that store history in different formats like Firefox and Chrome?

In the above cases, you can always search for URLs in a bit more forceful, yet un-structured manner. If you don't already have a favorite PCRE for finding domains, IPs, and URLs, try some of the ones on http://regexlib.com/Search.aspx?k=URL (some may not work with Yara). For testing purposes I just grabbed the first one on the list:

Whether you're searching for complex data structures, simple strings, regular expressions, or byte patterns, Volatility can do what you want (if you learn to use it right) or it can easily be programmed for your needs. In this post we discussed to how find and parse data Internet history records - a task we normally wouldn't prioritize above some of the other really exciting and innovative things going on in our labs. However, we know Volatility users will enjoy some extra tricks and information on plugin development. Learn by example, become a power user, then spread the knowledge!