As you probably already know, the Windows Registry is a treasure trove of forensics artifacts that can come in quite handy during investigations and incident response. Many applications leave quite the trail, and I’ve decided to start documenting these less common sections in the registry and sharing the information that I find on my blog. We’ll start with Adobe Acrobat Reader:

In addition to recently accessed files showing up under the RecentDocs key, Acrobat Reader itself stores a list of the 5 most recently accessed PDF files in the user’s hive. This information can be found in the subkeys under Software\Adobe\AVGeneral\cRecentFiles. The subkeys found in this location are labeled cx (where x is replaced by the numbers 1 through 5), and under each of these subkeys you’ll find a value named tDIText which contains the full path and filename of the recently accessed pdf file. Every time a new PDF file is opened in Reader, any existing values found in cx are copied to cx+1 and any values that were in c5 are lost (of course, keep in mind that you may be able to use VSS to recover old hives). Unfortunately, Reader does not store date/time stamp values in these subkeys; however, you can get the date and time of the most recent file access (for the file information stored in c1) by reviewing the registry key’s last write time. For all of the other files described in the other subkeys, given no other supporting data, you’ll only be able to state that the pdf file was accessed but will be unable to definitively state when.

If/when I discover any other interesting artifacts left by Adobe Acrobat Reader in the registry, I’ll make sure to update this post with my findings. Feel free to leave me a comment as well if you have any additional Reader related artifacts that you review as part of your workflow…

I got a bit waylaid with how Dropbox performs host-level authentication while I was researching and documenting forensic artifacts that Dropbox leaves lying around, but finally have gotten the chance to come back around to finish my research/documentation. Here’s a summary of my observations:

Dropbox binaries are installed into %AppData%\Dropbox\bin instead of the standard %PROGRAMFILES%. During the install, a number of registry keys were added (13), although they contained no forensically useful data.

The Dropbox configuration and state is stored in SQLite files found in %AppData%\Dropbox

config.db: contains baseline configuration settings that the Dropbox client references in order to run in a table named config. Records of interest include:

host_id: the authentication hash used by the Dropbox client to authenticate into the Dropbox “cloud.” This hash is assigned upon initial install/authentication and does not change unless revoked from the Dropbox web interface.

email: account holder’s email address. Can be changed to any value without consequence – set at install/authentication.

dropbox_path: actual path to the user’s Dropbox on the local system.

recently_changed3: lists the path/filename for the five most recently changed files- this includes files removed/deleted from the Dropbox. This is probably the only truly useful forensic artifact produced by Dropbox (other than the usual filesystem related artifacts). The BLOB for this record is text-based and is consistently formatted:

text begins with “lp1”, ends with “a.”

entries are in order of most recent to least recent and each entry the filename/path is followed by “I00” and “tp#” (replace # with the order that the file is in + 1, i.e. first entry is followed by “tp2”), separate by line breaks.

if the file has been removed/deleted from the Dropbox, the “I00” text is removed and a “N” is placed in front of the “tp#”. So, an example of a removed/deleted file is would be:(V41725479:/new file.txt Ntp2

root_ns: appears to be used throughout the Dropbox DBs to reference the base Dropbox path/location.

filecache.db: contains a number of tables, but the primary focus is to describe all files actively in the Dropbox (deleted/removed files are removed from this table upon deletion/removal). Tables and records of interest:

file_journal: includes the filename, path, size (in Bytes), mtime (file modified time, in Unix/POSIX format), ctime (file created time, in Unix/POSIX format), local_dir (flag indicating whether the entry is a directory), and more (mainly unpopulated).

block_cache: hash id (id) and hash. Hash is of an unknown format and did not match up with anything I could generate using standard tools.

mount_table: appears to list folders that are shared with other Dropbox users.

host.db: actually not a SQLite database but contains what looks to be a hash of some sort (possibly SHA-1?) and the dropbox path (dropbox_path in config.db) encoded in base-64. The entire file may be encoded in base-64 (basing this on a few Dropbox forum postings I read), but the first part of the file does not decode into anything human readable or match any other fields that I observed in the other DBs.

sigstore.db: stores hash values which correspond to the values found in the block_cache table in filecache.db.

unlink.db: appears to be a binary file and is not a SQLite database. Format and purpose is unknown.

Honestly, short of the recently_changed3 record in the config database, there really isn’t a significant number of useful forensic artifacts generated by Dropbox. Given Dropbox writes to the local filesystem, your standard filesystem analysis steps will encompass files stored/synced into a subject’s Dropbox; but perhaps, under certain circumstances, the recently_changed3 record and/or the Dropbox ctime/mtime entries for files could come in handy…

For the past several days I have been focused on understanding the inner workings of several of the popular file synchronization tools with the purpose of finding useful forensics-related artifacts that may be left on a system as a result of using these tools. Given the prevalence of Dropbox, I decided that it would be one of the first synchronization tools that I would analyze, and while working to better understand it I came across some interesting security related findings. The basis for this finding has actually been briefly discussed in a number of forum posts in Dropbox’s official forum (here and here), but it doesn’t quite seem that people understand the significance of the way Dropbox is handling authentication. So, I’m taking a brief break in my forensics-artifacts research, to try to shed some light about what appears to be going on from an authentication standpoint and the significant security implications that the present implementation of Dropbox brings to the table.

To fully understand the security implications, you need to understand how Dropbox works (for those of you that aren’t familiar with what Dropbox is – a brief feature primer can be found on their official website). Dropbox’s primary feature is the ability to sync files across systems and devices that you own, automatically. In order to support this syncing process, a client (the Dropbox client) is installed on a system that you wish to participate in this synchronization. At the end of the installation process the user is prompted to enter their Dropbox credentials (or create a new account) and then the Dropbox folder on your local system syncs up with the Dropbox “cloud.” The client runs constantly looking for new changes locally in your designated Dropbox folder and/or in the cloud and syncs as required; there are versions that support a number of operating systems (Windows, Mac, and Linux) as well as a number of portable devices (iOS, Android, etc). However, given my research is focusing on the use of Dropbox on a Windows system, the information I’ll be providing is Windows specific (but should be applicable on any platform).

Under Windows, Dropbox stores configuration data, file/directory listings, hashes, etc in a number of SQLite database files located in %APPDATA%\Dropbox. We’re going to focus on the primary database relating to the client configuration: config.db. Opening config.db with your favorite SQLite DB tool will show you that there is only one table contained in the database (config) with a number of rows, which the Dropbox client references to get its settings. I’m going to focus on the following rows of interest:

email: this is the account holder’s email address. Surprisingly, this does not appear to be used as part of the authentication process and can be changed to any value (formatted like an email address) without any ill-effects.

dropbox_path: defines where the root of Dropbox’s synchronized folder is on the system that the client is running on.

host_id: assigned to the system after initial authentication is performed, post-install. Does not appear to change over time.

After some testing (modification of data within the config table, etc) it became clear that the Dropbox client uses only the host_id to authenticate. Here’s the problem: the config.db file is completely portable and is *not* tied to the system in any way. This means that if you gain access to a person’s config.db file (or just the host_id), you gain complete access to the person’s Dropbox until such time that the person removes the host from the list of linked devices via the Dropbox web interface. Taking the config.db file, copying it onto another system (you may need to modify the dropbox_path, to a valid path), and then starting the Dropbox client immediately joins that system into the synchronization group without notifying the authorized user, prompting for credentials, or even getting added to the list of linked devices within your Dropbox account (even though the new system has a completely different name) – this appears to be by design. Additionally, the host_id is still valid even after the user changes their Dropbox password (thus a standard remediation step of changing credentials does not resolve this issue).

Of course, if an attacker has access to the config.db file (assuming that it wasn’t sent by the user as part of social engineering attack), the assumption is that the attacker most likely also has access to all of the files stored in your Dropbox, so what’s the big deal? Well, there are a few significant security implications that come to mind:

Relatively simple targeted malware could be designed with the specific purpose of exfiltrating the Dropbox config.db files to “interested” parties who then could use the host_id to retrieve files, infect files, etc.

If the attacker/malware is detected in the system post-compromise, normal remediation steps (malware removal, system re-image, credential rotation, etc) will not prevent continued access to the user’s Dropbox. The user would have to remember to purposefully remove the system from the list of authorized devices on the Dropbox website. This means that access could be maintained without continued access/compromise of a system.

Transmitting the host_id/config.db file is most likely much smaller than exfiltrating all data found within a Dropbox folder and thus most likely not set off any detective alarms. Review/theft/etc of the data contained within the Dropbox could be done at the attackers leisure from an external attacker-owned system.

So, given that Dropbox appears to utilize only the host_id for authentication by design, what can you do to protect yourself and/or your organization?

Don’t use Dropbox and/or allow your users to use Dropbox. This is the obvious remediating step, but is not always practical – I do think that Dropbox can be useful, if you take steps to protect your data…

Protect your data: use strong encryption to protect sensitive data stored in your Dropbox and protect your passphrase (do not store your passphrase in your Dropbox or on the same system/device).

Be diligent about removing old systems from your list of authorized systems within Dropbox. Also, monitor the “Last Activity” time listed on the My Computers list within Dropbox. If you see a system checking in that shouldn’t be, unlink it immediately.

Hopefully, Dropbox will recognize the need for additional security and add in protection mechanisms that will make it less trivial to gain long-term unauthorized access to a user’s Dropbox as well as provide better means to mitigate and detect an exposure. Until such time, I’m hoping that this write-up helps brings to light how the authentication method used by Dropbox may not be as secure as previously assumed and that, as always, it is important to take steps to protect your data from compromise.

Update (10/31/2011): Dropbox has release version 1.2.48 that utilizes an encrypted local database and reportedly puts in place security enhancements to prevent theft of the machine credentials. I have not personally re-tested this release – feel free to comment if you’ve validated that the new protection mechanisms operate as described.

Keyword searches can be a significant aspect of an investigation and given the prevalence of Microsoft Outlook you’ll most likely find yourself needing to search through PST files for data, be it a simple keyword or more complex pattern. Even though you can use Outlook to open up a PST file, my personal preference is not to do the search within Outlook itself for two primary reasons:

Outlook will change data within the PST file; of course, you’re working on a copy – but I prefer to not have dynamically changing data (i.e. unread/read status, etc) when I’m doing my analysis.

If you’re wanting to find data matching a certain pattern (i.e. Regular Expressions) or data that is not within the message body (i.e. message header data), Outlook does not really have the facilities to support these kinds of searches.

Of course, there are several commercial investigative tools that will parse through and allow you to search PST files (FTK and Encase come to mind) but in this post I’m going to focus on performing the extraction and search with only free tools in a Linux environment.

What you’ll need:

A relatively up-to-date Linux system (be it physical or VM).

Readpst compiled/installed (in Ubuntu: apt-get install readpst) – readpst is a utility included with libpst which can be found here.

Also, I’m going to begin by assuming that you’ve acquired the PST file in a forensically sound fashion and that a copy of the file is accessible on your Linux system. Let’s get started…

Extracting data from a PST file using readpst

Run readpst on the PST file to extract all objects within the PST (i.e. messages/attachments, calendar entries, contacts, etc). By default, readpst exports data in mbox format – this ends up placing all of the extracted objects into a set of mbox files (one per subfolder), which can make extracting objects that match a search criteria a bit tedious. Instead, we’re going to tell readpst to write each object into its own file, the command looks like:

readpst -S -o out/ outlook.pst

Where out/ is the directory where you’d like readpst to output the files and outlook.pst is the PST file that you’re extracting data from. The -S flag indicates that you’d like readpst to extract each object separately, rather than in mbox format.

Once readpst has finished, in your output directory you’ll find a directory structure that matches the folder structure of the PST (generally starting with a base directory of Outlook). Within each of these folders you’ll find numerically named files that contain plain text representing the exported object (i.e. for a email message you’ll find the message body, headers, etc).

Working with the extracted data

Thanks to readpst, it is quite trivial to extract all data within a PST file into a nicely organized (and basically human readable) set of files and at this point you can begin processing these files as you would any other text file. For example, a commonly seen forensic task would be to search all objects within a PST for certain keywords or perhaps a pattern. As an example of pattern matching, let’s say you were investigating a PII incident and you wanted to see whether a subject had utilized email to send or receive emails that appear to contain social security numbers. You could use grep to search the files within the directory structure that readpst created with the following command:

This is telling grep to run a recursive search using a regular expression which will match numbers that look like SSNs in the readpst output directory. From there, you could even automate this process using a script to automatically move matching messages to a target folder that you could manually validate (or whatever the next step of you given workflow is).

As you can see, forensically analyzing PST files using freely available software is quite easy and can be a very powerful method for efficiently extracting case-pertinent data. Give it a try sometime…

On a side note, I’ve added a new Resources section to my blog and one of the pages contained within this section is dedicated to listing useful regular expressions (such as the SSN matching regular expression I used above). Right now, that is the only one I have up there, but I’ll keep adding to this page as I think of other useful regular expressions, so check back regularly.

Every file system handles MAC times slightly differently, however sleuthkit (as well as other forensics software products) use the same acronym/fields no matter which file system you’re analyzing. Here’s a quick run-down of some popular file systems and what the M, A, C, and B mean:

Harlan Carvey recently wrote a post on his blog called Accessing Volume Shadow Copies, which provided some excellent instruction on how you can go about accessing Volume Shadow Copies (VSCs) from an existing image without having to use expensive tools (in fact, his solution uses completely free tools). In Windows 7 and Vista, VSS is turned on by default and thus additional artifacts are possibly just waiting to be discovered. Accessing a system’s VSC(s) can be highly useful in an investigation and can possibly help you get your hands on older copies of registry hives (i.e. being able to get historical UserAssist data, etc) as well as other older file snapshots (pictures, etc), which can come in very handy. So, needless to say, if you’re not presently looking for VSCs as part of your forensics workflow, you probably should be…

In the process of testing Harlan’s procedure, I started to wonder how Windows, by default, decides to generate these VSCs (and what is included). I came up with some data and I thought that I’d go ahead and post my findings (feel free to comment if I’ve gotten anything wrong here):

A scheduled task (named SR in Win7) controls when a snapshot occurs. By default, the task is set to run at 12:00AM every day and 30 minutes after every system startup, but will only execute when the system is plugged in and has been idle for 10 minutes. If the system is not idle, the task will continue to wait for idle for 23 hours.

If a restore point/snapshot has not been successfully created in the last seven days, system protection will create one automatically.

And finally, a restore point may be created “automagically” as part of certain software installation/driver installation processes.

The bottom line: it is basically impossible to predict with any degree of certainty when a snapshot will occur.

All files/folders are covered in a volume snapshot, except for those defined under the HKLM\System\CurrentControlSet\Control\Backup Restore\FilesNotToSnapshot registry key.

If a file is modified several times between snapshots, only the version that was current when the restore point/VSC was made will be available to you for analysis. Mind you, there may be multiple VSCs available, so that can be helpful with getting further historical revisions.

Let’s just say that you’re doing an investigation of a subject and that the investigation centers around proving that they visited a certain website. Of course, you’ve check the usual places: history files, typed URLs in the registry, etc – but everything looks pretty clean. In fact, things are looking a bit *too* clean. You’re starting to suspect that the subject may have used some sort of private browsing mode while browsing. All is not lost, especially if the subject used Internet Explorer 8’s InPrivate Browsing mode, because thankfully (for us at least) Internet Explorer leaves some artifacts lying around that include URLs and sometimes even a site title. Let’s jump in…

Internet Explorer creates recovery files, which are used in the event that the browser does not exit cleanly (i.e. crashes) in order to restore the browser state upon restart. These “files” are stored in memory and depending on what the OS does from a memory management standpoint, could possibly be placed on the disk via the pagefile or hibernation file. If the recovery files are indirectly written to the disk, we’re in luck. There are, however, a few limitations that you should note. First, there is limited metadata available – given where these artifacts were originally written (i.e. pagefile.sys and hiberfil.sys) there may be limited (if any) pertinent metadata available for the artifacts that you find, so it may be difficult to place the activity on a timeline. Second, you will not be able to definitively state that a subject utilized InPrivate Browsing to visit the URLs that you find – all you can say is that the URLs found during your search for these “recovery files” have been visited at some point in time on the system where the drive was connected (of course, be careful about possible drive residue if the drive was previously used in a different system).

To find these artifacts, you’re going to be looking for a specific signature on a drive image. The signature appears to be highly predictable (at least based upon my limited research) and presents as follows:

Lead-in (hex): 61 80 00 00 00 00

URL (variable length)

If there is a title present:

10 bytes (doesn’t appear to be consistent, however sometimes 0x2A or 0x40 appears in the 7th byte – I’m sure these have some significance, I just haven’t figured out what yet)

Website title (variable length)

Lead-out (hex): 01 00 + 6 bytes of null padding + FF FF FF FF (may not consistently appear, and doesn’t appear to be used if a title is not present)

Remember, we’re dealing with artifacts here, in a pagefile or hibernation file, and most likely retrieved from unallocated space – so the presentation may not be always consistent (or readable for that matter). When viewed in a hex editor, a complete artifact looks like this:

So given you now know what these artifacts look like, you should be able to quickly write yourself a little script (I’m actually working on one right now – I’ll publish it once it has been completed) to find these artifacts to include in your evidence acquisition arsenal. Or feel free to use a commercial tool designed to find these artifacts (like Internet Evidence Finder)…

I’ve used searching in my previous PowerShell posts, but I thought that it deserves a dedicated “Quick Tip” posting. I know that folks coming from a *nix background will be very familiar with using grep to search for pretty much anything and seemingly not having access to this tool can be disappointing for those trying to use Windows as their primary OS (for the one or two of you out there that have decided to come to the “dark side” 😎 ). But…do not fret! There are a number of ways to run equivalent searches within Windows out of the box. Since I’ve been on a PowerShell kick lately, let me introduce you to a decent grep alternative that is built into PowerShell: select-string.

Select-String is a built-in cmdlet in PowerShell that will allow you to search files, piped input, objects, etc for a pattern (which is, by default, a regular expression). Select-String can take in a number of options, but can be quite simple to use. For example, if you want to search for the text “evildoer” within all files in the current directory, you can use the following command:

select-string .\*.* -pattern "evildoer"

It is important to note that by default, select-string is case-insensitive; so, if you need a case sensitive search, add in the -CaseSensitive parameter.

Ok, now let’s do something a bit more complex. I want to look for anything that looks like an email address in all txt files recursively under C:\. So, to make that happen we need to do a little more work:

First, I need to use get-childitem (think DIR) to recursively go through the drive and return only files matching *.txt. Then I pipe these returned objects to select-string and search their contents by using a basic regex that will match on things that look like email addresses.

Of course, there are a number of other ways to use select-string but since this is a “quick tip” I’ll keep things brief. If you’d like more details, you can find additional information in the PowerShell documentation on TechNet.

On a cold and rainy Thursday morning, I thought that it would be a good time to write a post on searching the Windows registry using PowerShell. In an Incident Response scenario you may want or need to do some live analysis on a compromised system, and part of this analysis may be to search the registry for some sort of artifact that is appropriate. Using PowerShell can help you do this in a relatively efficient manner and is, of course, built in on new version of Windows (i.e. Windows 7, 2008, etc).

For example, let’s say that you know (or have guessed) that you’re dealing with some sort of malware that is probably going to be calling home at some time and you are wanting to look through the registry to see if the malware author decided to store any IPs/URLs in the clear. In PowerShell you are able to easily browse and search through the registry, just like you were dealing with a filesystem. There are a number of ways to accomplish this (for example, using -match rather than select-string), so feel free to use whatever method you’re comfortable with. But, let me show you how I mangled my way through it this morning…

Open up a PowerShell window.

Let’s look for things that appear to be IP addresses under HKEY_CURRENT_USER, so first I need to recursively iterate through everything under that hive. I do this by using the Get-ChildItem method:

Get-ChildItem HKCU:\ -rec -ea SilentlyContinue

This method returns a complete list of all keys (as objects of course) under the HKCU hive.

From there, we’re going to need to dig into each of these returned objects and do our search. So I’m going to pipe the output of the previous command into a foreach loop and then retrieve the data for each key:

You’ll notice the use of a simple regular expression that will match on things that “look” like IP addresses. If, for example, you’d prefer to look for URLs, a simple regex that you can use that’ll match most URLs would be: “\b(ht|f)tp(s?)[^ ]*\.[^ ]*(\/[^ ]*)*\b”.

So putting it all together, to perform a simple string search of the registry for possible IP addresses and URLs using a regular expression you can use the following script:

This code will return any hits on the specified regular expressions, but doesn’t actually give you context as to where it was found within the registry. If you’re just looking for odd URLs/IP addresses, it may be useful for you to just see a simple list of both to run through; but, if you want more context you may want to use a conditional with -match rather than select-string and then just output $CurrentKey:

PowerShell can be a really powerful tool for easily searching the registry and is a good, albeit slightly slower, alternative to using another method that would require an interpreter, etc (i.e. Perl). Have fun!

Along with disabling autoplay/autorun, you may want to consider turning off the automount functionality of Windows systems requiring high security and is a decent secondary protection on a forensics workstation (you are using a hardware write blocker as well….right? :-)).

To disable automount (this has been tested under Windows 7) either:

run diskpart and once at the prompt type: automount disable

or, execute: mountvol /N

or, set HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MountMgr\NoAutoMount to 1 in the registry (you’ll see this entry change appropriately if you use one of the previously mentioned commands).

NOTE: the commands mentioned above will need you to “Run as an Administrator” in Windows 7.