One of the more useful network forensic skills is the ability to extract files from packet captures. This process, known as packet data carving, is crucial when you want to analyze malware or other artifacts of compromise that are transferred across the network. That said, packet data carving has varying degrees of difficulty depending on the type of traffic you are attempting to extract data from. Carving files from simple protocols like HTTP and FTP is something that can be done in a matter of minutes and is usually cut and dry enough that it can be done in an automated fashion with tools like Foremost and Network Miner.

There are articles all over the Internet about carving files from simple protocols so I won’t rehash those. Instead, I want to take a look at a two more complex protocols that are extremely common in production networks. Server Message Block (SMB) is the application-layer protocol that Microsoft operating systems use for file sharing and communication between networked devices. If you live on a Microsoft network (or a Unix network that utilizes SAMBA) then you are a user of SMB or SMB2, depending on your operating system version. In this article I’m going to discuss the art of carving files from SMB and SMB2 traffic. If you want to follow along you’ll need to download a copy of Wireshark (http://www.wireshark.org) and your favorite hex editor. I’ve used Cygnus Hex Editor (http://www.softcircuits.com/cygnus/fe/) for the purpose of this article since it’s simple and a free version exists.

Carving Files from SMB Packets

The first version of SMB is in use on all modern Microsoft operating systems prior to Windows Vista. In order to setup a packet capture for this scenario I took two Windows XP SP3 virtual machines running on VMWare Workstation and placed them in the same network. Once they were able to communicate with each other I setup a shared folder on one host (192.168.47.132) that is acting as the server. I then fired up Wireshark and began capturing packets as I copied an executable file from the client (192.168.47.133) to the servers shared folder. The resulting packet capture is called smb_puttyexe_xfer.pcap.

If you’ve never looked at SMB traffic then don’t get scared by all the different types of SMB packets in the capture, we will only be looking at a few of them. This article isn’t meant to be an exhaustive reference on each and every type of SMB packet (there are over a hundred of them), so if you want the gory details then take a look at the references at this end of this article.

In order to carve the file out of these packets we have to find some basic information about it. Before and after transferring a file to a server the client will attempt to open the file in order to see if it exists. This is done with an SMB NT Create AndX Request packet. The response from the server to this is an SMB NT Create AndX Response, which contains the name, extension, and size of the file being transferred. This is everything we need to get started. You can filter for Create AndX Response packets in Wireshark with the filter (smb.cmd == 0xa2) && (smb.flags.response == 1). If we examine one of those requests that occur after the file has been transferred, we can identify that the file being transferred is putty.exe and its file size is 454,657 bytes. We will use this information later.

Figure 1: Note the file name, extension, and size.

The next step we have to take in order to extract this file is to isolate the appropriate block of traffic. Wireshark makes this pretty easy with its Follow TCP Stream functionality. Start by right-clicking any packet in the capture file and selecting Follow TCP Stream. This will bring up a window that contains all of the data being transferred in this particular communication stream concatenated together without all of the layer 2-4 headers getting in the way. We are only concerned about the traffic transferred from the client to the server so we will need to specify this in the directional drop down box by selecting 192.168.47.133 –> 192.168.47.132 (458592 bytes). Click Save As and save the file using the name putty.raw.

Figure 2: Saving the isolated traffic from Wireshark

If you were to view the properties of the data you just extracted and save you should find that its file size is 458,592 bytes. This is 3,935 bytes more than the size of the actual file that was transferred. This means that our goal is to get this raw files size down to exactly 454,657 bytes. This is where the real carving begins.

First things first, we have to delete all of the extra data that occurs before the executable data actually begins. Since we do know that the transferred file is an executable the quickest way to do this is to look for the executable header and delete everything that occurs before it. The executable header begins with the hex bytes 4D 5A (MZ in ASCII), which occurs approximately 1112 bytes into the putty.raw file. Once deleted, resave the file as putty.stage1. You should now be down to a file size of 457,480 bytes.

Figure 3: Removing added bytes from the beginning of the file

Now things get a bit trickier. SMB transmits data in blocks. This is great for reliability since a lost or damaged block can be retransmitted, but it adds some extra work for us. This is because each block must contain some bytes of SMB header data in order to be interpreted correctly by the host that is receiving it. The good thing is that the size of this data is somewhat predictable, but you have to understand a bit more about SMB in order to put the rubber to the road. The thing to know here is that the data block size in SMB is limited to 64KB, or 65536 bytes. Of this amount, only 60KB is typically used for each block. These 61,440 bytes are combined with an additional 68 bytes of SMB header information. This means that after every 61,440 bytes of data we will have to strip out the next 68 bytes.

There is one thing to add to this that must be taken into consideration before stripping out those bytes. As a part of the normal SMB communication sequence, an additional packet is sent right after the first block. This is an NT Trans Request packet, which is packet 77 in the capture file. The SMB portion of this packet is 88 bytes, which means we will have to remove those 88 bytes in addition to the 68 bytes that make up the normal SMB block header, for a total of 156 bytes.

Now that we have all that sorted out let’s start removing bytes. In your hex editor, skip one byte past the 61,440th byte. This will be offset 0x0F000. You should start with this byte and select a range of 156 bytes and delete them. Save this file as putty.stage2.

Figure 4: Removing the initial 156 bytes

Things get a bit easier now as we are just concerned with stripping out the 68 bytes after every block. Skip through the file in 61,440 byte increments deleting 68 bytes each time. This should occur X times in this file at offsets 0x1e000, 0x2d000, and 0x3c000, 0x4b000, 0x5a000, 0x69000. Once finished, save the file as putty.stage3.

Figure 5: Removing a 68 byte SMB header block

Go ahead and take a look at the file size of putty.stage3. We are still XXX bytes off from our target, but luckily the last part is the easiest. The data stream is actually just padded by some extra information that needs to be deleted. We know that the file should be 454,657 bytes, so browse to that byte and delete everything that occurs after it.

Figure 6: Trimming the extra bytes off the end of the file

Save the final product as putty.exe and if you did everything right, you should have a fully functioning executable.

Figure 7: Success! The executable runs!

The whole process can be broken down into a series of repeatable steps:

Record the file name, extension, and size by examining one of the SMB NT Create AndX Response packets

Isolate and extract the appropriate stream data from Wireshark by using the Follow TCP Stream feature and selecting the appropriate direction of traffic

Remove all of the bytes occurring before the actual file header using a hex editor

Following the first 61,440 byte block, remove 156 bytes

Following each successive 61,440 byte block, remove 68 bytes

Trim the remaining bytes off of the file so that it matches the file size recorded in step 1

Carving Files from SMB2 Packets

Microsoft introduced SMB2 with Windows Vista and began using it with its newer operating systems moving forward. In order to setup a packet capture for this scenario I took two Windows 7 (x32) virtual machines running on VMWare Workstation and placed them in the same network. Once they were able to communicate with each other I setup a shared folder on one host (192.168.47.128) that is acting as the server. I then fired up Wireshark and began capturing packets as I copied an executable file from the client (192.168.47.129) to the servers shared folder. The resulting packet capture is called smb2_puttyexe_xfer.pcap.

You should notice that this traffic is a little bit cleaner than the SMB traffic we looked at earlier. This is because SMB2 is optimized so that there are a lot less commands. Whereas SMB had over a hundred commands and subcommands, SMB2 only has nineteen. Regardless, we still need to find the filename being transferred and the size of that file. One of the best places to do this is at one of the SMB2 Create Response File packets. This packet type serves a purpose similar to that of the SMB NT Create AndX Response packet. You can filter these out in Wireshark with the filter (smb2.cmd == 5) && (smb2.flags.response == 1). The last one of these in the capture, which is packet 81, is the one we want to look at since it occurs after the file transfer is complete. This identifies the file name as putty.exe and the file size as 454,656 bytes. This is indeed the same file as our earlier example, but it is being reported as being one byte smaller. The missing byte is just padding at the end of the file and has a null value so it’s not of any real concern to us.

Figure 8: Once again we note the file name, extension, and size

At this point you should perform the same steps as we did earlier to isolate and extract the data stream from the capture using Wiresharks Follow TCP Stream option. Doing this should yield a new putty.raw file whose file size is 459,503 bytes. This is 4,847 too big, so it’s time to get to carving.

Once again we need to start by stripping out all of the data before the executable header. Fire up your favorite hex editor and remove everything before the bytes 4D 5A. This should account for a deletion of 1,493 bytes.

Figure 9: Removing the extra bytes found prior to the executable header

Now things change a bit. SMB2 works in a method similar to SMB, but it actually allows for more data to be transferred at once. SMB had a maximum block size of 64K because it has a limit of 16-bit data sizes. SMB2 uses either 32-bit or 64-bit data sizes, which raises the 64KB limit. In the case of the transfer taking place in the sample PCAP file, these were two 32-bit Windows 7 hosts under their default configuration which means that the block size is set at 64KB. Unlike SMB however, the full 64KB is used, so we will see data in chunks of 65,536 bytes being transferred. These 65,536 bytes combine with a 116 byte SMB2 header to form the full block.

SMB2 doesn’t include an additional initial request packet like the SMB Trans Request, so we don’t have to worry about stripping out any extra bytes right off the bat. As a matter of fact, some might say that carving data from SMB2 is a bit easier since you only have to strip out 116 bytes after each block of 65,536 bytes. You can do this now on putty.stage1. In doing so you should be deleting 116 bytes of data at offsets 0x10000, 0x20000, 0x30000, 0x40000, 0x50000and 0x60000.

Figure 10: Removing 116 bytes of data following the first 65,536 chunk

Once you’ve finished this save the file as putty.stage2. All that is left is to remove the final trailing bytes from the file. In order to do this, browse to by 454,656 and delete every byte that occurs after it.

Figure 11: Removing the final trailing bytes

Finally, save the file as putty.exe and you will have a fully functioning executable. The process of carving a file from an SMB2 data stream breaks down as follows:

Record the file name, extension, and size by examining one of the SMB2 Create Response File packets

Isolate and extract the appropriate stream data from Wireshark by using the Follow TCP Stream feature and selecting the appropriate direction of traffic

Remove all of the bytes occurring before the actual file header using a hex editor

Trim the remaining bytes off of the file so that it matches the file size recorded in step 1

Conclusion

That’s all there is to it. I’ll be the first to admit that I didn’t cover every single aspect of SMB and SMB2 here and there are a few factors that might affect your success in carving files from these streams, but this article shows the overall process. Taking this one step farther, it’s pretty reasonable to assume that this process can be automated with a quick Python script, but this is something I’ve not devoted the time to yet. If you feel like taking up that challenge then be sure to get in touch and I’ll be glad to post your code as an addendum to this post. In the mean time, happy carving!

In the realm of network security monitoring and intrusion analysis we are all slaves to our data. Typically speaking, we rely on two different types of data at the network layer; full content data (PCAP) and session data (Netflow). Both are pretty easy to generate given the right sensor placement, and there are a lot of great resources out there for learning how to get good value out of the data. That said, they do each have their own shortcomings as well.

Session Data (Netflow)

Netflow is a standard form of session data that details the ‘who, what, when, and where’ of network traffic. I tend to equate this to the call records you’ll see on your monthly cell phone bill.

Figure 1: Partial Netflow Records Exported from SiLK

The best thing about netflow is that it provides a lot of value with minimal disk storage overhead. It’s really a lot of bang for your buck. Most commercial grade routers and firewalls will generate netflow, and there are a lot of free and open source tools, such as SiLK, that can be used to generate and analyze netflow as well. There is even a yearly conference called FloCon where people get together and talk about cool things you can do with netflow. The only real downside to netflow data is that it doesn’t paint a complete picture, so it’s often best used as a complement to full content data.

Full Content Data (PCAP)

If netflow session data is equivalent to a call log, then full content data in the form of PCAP is just like having a full recording of all of your calls.

Figure 2: PCAP Data Investigation with Wireshark

The PCAP format has become very universal and can be collected and analyzed with a variety of free and open source applications like Dumpcap, Tcpdump, Wireshark, and more. A lot of the more popular intrusion detection systems, such as Snort, use the PCAP format as well. As an analyst, having PCAP data available tends to make the analytical process a dream come true as it provides the highest level of context when investigating an anomaly. The primary downside to full content data is that it has an incredibly high disk storage overhead, which prevents most organizations from collecting and storing any reasonable amount of it. In my experience, the organizations that are capable of collecting and storing PCAP can only measure the amount stored in hours, rather than days. In addition to this, unless you have an idea of what you are looking for within a reasonable time range, it can be a bit difficult to locate things as well, somewhat impeding flexibility in analysis.

Application Layer Metadata

The concept of application layer metadata originally presented itself to me in a discussion regarding additional data types that are useful within the network security monitoring function that were sort of a happy medium in between session data and full content data. It didn’t take a lot of number crunching to find that on most of the networks we monitored, the vast majority of the traffic was the application layer data of a few common protocols. The largest of these was HTTP, followed by the other usual suspects; SSL, DNS, and SMTP.

Starting with a couple of these protocols as a baseline, we quickly realized that we could save ourselves a lot of disk storage overhead by actually eliminating the stuff we didn’t need. There are an unlimited number of ways to do this, but we wanted to go with the keep it simple philosophy, so we started by using tcpdump to read in our PCAP data, outputting the ASCII formatted data to a file. Then, we ran the Unix strings command on that file to get read of any binary data that we couldn’t read anyways. We weeded out a few more things that we didn’t want through a magical combination of SED and AWK, added in the appropriate timestamps, formatted the data a bit prettier, and we had achieved our goal.

The end result of a reasonably small bash script was the ability generate application layer metadata in the form of something we call a Packet String, or PSTR file (pronounced pee-stur). The script is ideally designed to run as a cron job where it parses continually generated PCAP files in order to generate accompanying PSTR files.

Figure 3: Sample PSTR Data

You can download the bash script that generates this data from PCAP files here. This is provided as a simple proof of concept and takes an input PCAP file and generates an output PSTR file. Now that we’ve got application layer metadata being generated in the form of PSTR files, let’s take a look at a few use cases.

Using PSTR as a Data Source

The original goal of generating PSTR files was to provide a data format with a low disk storage overhead that provided value to analysts as a secondary NSM data source. In a typical workflow, analysts would take an input from a detection capability, such as an IDS, and then PSTR would be another data source available to the analyst in order to provide supporting evidence in the analysis of a potential event or incident. I’ve written a few use cases here. Some of these are theoretical, but others are examples of actual things that have happened since implementing the PSTR data type.

Malware Infection Use Case

Let’s look at an example in which we’ve just received an alert from our IDS stating that an internal system has been detected as exhibiting symptoms of infection. The signature that fired did so because it saw a malicious GET request associated with a known botnet C2 server. The host was examined, and it appeared as though the GET request matches what is expected as a result of the signature that fired, so were able to determine with a pretty reasonable certainty that this box was infected.

Upon closer examination, we also notice that the infected host was also sending an HTTP POST with a very unique string. This looked like it might be an indicator of malicious activity, but it wasn’t something that any of your existing signatures fired on. In this case, an analyst was very quickly able to use GREP to quickly find other instances of this same string within the HTTP header data of all traffic on our monitored networks. PSTR data proved to be incredibly useful in finding other infected boxes across multiple networks.

Targeted Phishing Use Case

As a theoretical example, consider another example where several users have contacted your security team because they’ve received a very suspicious e-mail that seems to be targeted specifically at your company. This e-mail mentions a payroll adjustment and asks the client to access the provided link and log in with their employee ID number and password.

After examining the e-mail, you’ve determined that it has been sent from a spoofed e-mail address and that it uses a slightly modified subject line that is unique to each recipient. You’ve also noticed that based upon the reports you’ve received from users, these e-mails have come in over the past several weeks. One of the things you would want to do in this case would be to find who within your organizations received this e-mail. The purpose of this is to be able to warn the users not to click the link in the e-mail and also in hopes that you might be able to find a pattern as to why the selected recipients were chosen (access to certain systems, high profile employees, etc).

Typically, you might search through Exchange or Postfix logs to see if you can find who the recipients were. This of course relies on your organization having adequate logging and retention of those logs. The unique nature of the string however, makes it difficult to query these data sources. Using PSTR data, you can write a quick regular expression to match the semi unique subject lines and run a very quick query that will give you these results.

Using PSTR as a Detection Capability

The one thing we didn’t really anticipate when we created the PSTR file type was its use as a second level detection capability. When I refer to second level analysis and detection, I’m referring to moving past near real-time detection to the point in which analysts start reviewing traffic retrospectively to find things that signatures don’t catch. This often involves statistical and anomaly based detection with large data sets. This is something PSTR is perfect for.

User Agent Use Case

The user agent field within an HTTP header is always a good source for catching the low hanging fruit when it comes to malware infections on a network. Lots of malware will use a custom value in this field that deviates from standard browser identifying strings. The detection technique I’ve seen most commonly deployed to catch these types of malware infections at the network level rely on IDS/IPS signatures. As a matter of fact, if you subscribe to the common popular Snort rule sets then probably are using their user agent rules to detect known bad user agents.
The only problem with that detection scenario is that malware is now being generated at a rate much faster than the AV and ISD companies can keep up with. As a result, there are a LOT of malicious user agents out there that aren’t accounted for. In addition to this, some malware uses randomly generated user agent strings, meaning it’s much more difficult to write adequate signatures for detection.

One day, one of our analysts started playing around with PSTR data and wrote a quick script to parse all of the PSTR data for a given site, grab all of the user agent strings, and sort those by uniqueness. As expected, there were thousands of occurrences of the typical Firefox and Internet Explorer user agents, but what was really interesting was that there were several user agent strings only seen a handful of times that didn’t correlate to any particular known browsers. After a bit more analysis, we ended up finding quite a few machines that were infected with malware variants using these custom user agent strings. This one was a home run.

E-Mail Subject Use Case

The previous use case got us to thinking about other common fields with application layer metadata that we could do the same types of analysis on. One such field was the e-mail subject line field. We modified our user agent parsing code to look at all PSTR data related to e-mail subject lines, and again had some very cool results.

Instead of most of the distribution being focused on one or two unique strings like we saw with user-agents, we saw that the distribution was spread very widely across thousands of different subject lines. This was expected, since most e-mails have a unique subject line. What interested us here however, was that we had a few subject lines that were used in excess. The first item of interest we found was that some sites had misconfigured applications that were mailing things to places they shouldn’t go, which was worth pursuing and getting fixed. We also found a user who was e-mailing all of his work documents to himself as a scheduled task every night, which was a policy violation.

This was all found with a very basic level of analysis, and it had some very real and useful results.

Additional Analytic Capabilities

The thing I love about this data format is that we can store a lot of it and it’s really quick to search through. With those things being true, there are a ton of things that can be done with it from a detection standpoint. A few immediate ideas include:

Searching for unique values with HTTP, SMTP, DNS, and SSL headers

This is what we did in most of these examples. You can really quickly sort through the unique values within certain fields and find the outliers that warrant additional investigation.

Byte entropy of certain fields to locate encrypted data where it shouldn’t be

It’s a common tactic to exfiltrate encrypted data through commonly used channels in an effort to hide in plain sight. Performing entropy calculations on GET and POST requests in an effort to find encrypted data would be a good way to detect where this might be occurring.

Checking the length of certain fields for anomalies

You can do some statistical analysis and determine that certain fields will often have values that have a length falling in a particular range. Using that, you can flag on outliers that are far too short or too long in order to look for anomalies. I’ve seen good success in doing this with the various HTTP header fields, e-mail subject lines, and SSL certificate exchanges.

Enumerating Downloads of Particular File Types

There is a great deal of value to being able to list all of the executables or PDF files downloaded within a certain time span. This is pretty easily achievable really quickly with analysis of HTTP header data within PSTR files.

Of course, all of these things CAN be done on PCAP data as well, but it would take significantly more processing power and it’s likely that you can’t store enough PCAP data at a given time to make it worth useful.

Conclusions

The concept of collecting and storing application layer metadata isn’t anything revolutionary. As a matter of fact, the idea isn’t even completely original as I’ve encountered other organizations that do similar things. There are even some commercial products that do this as well. However, I do know that nobody is sharing their methods and code with the world, which is the purpose of this post. Analysts live and die by their data feeds, and I think application layer metadata in whatever form it takes has its place amongst the other primary network data types. You can download the proof of concept code to generate and parse PSTR files here. I’m excited to see this data format evolve as we find more and more use for it. Look for more updates on this front as the code base continues to advance.

* A special thanks to my colleague Jason Smith for doing most of the legwork on writing the POC code.

I’m sitting in my hotel room after just finishing my last session at US-CERT GFIRST in Nashville, TN. This was my first time at GFIRST both as an attendee and presenter, and I really had a great time. Where I’m originally from in Kentucky isn’t too far from Nashville so I am familiar with the area and the venue choice, the Gaylord Opryland Hotel, is a beautiful facility and top-notch for this kind of conference. I wanted to take a moment to address where people can find the resources for my presentation as well as my thoughts on some of the presentations I had a chance to see and the conference as a whole.

My Presentation

Along with my friend and colleague Jason Smith, we presented a talk on Real World Security Scripting. At a bare minimum, we wanted to share some quick and dirty scripts we wrote to do some pretty neat things within our security operations center (SOC) at SPAWAR. At a higher level, we really hoped that we could encourage some people to get involved with low level BASH, Python, and PERL scripting to automate tasks within their SOC environment as well as increase capabilities of the SOC and its staff. We generated quite a bit of interest, and as a result it looks like several people were turned away because the room was filled to fire code capacity. Our sincere apologies to those who missed to talk. We got some really positive feedback from folks who did make it to the presentation.

As promised, we will be releasing our slides and source code for the presentation. The slides can be downloaded here. As for the source code, we are maintaining the distribution release on https://www.forge.mil, which requires a DOD CAC or ECA certificate to access. I understand that a lot of government folks outside of DOD don’t have access to forge.mil, so we are trying to find another place to host this code where we can control access to only people in the .gov or .mil space. In the meantime, if you would like to get copies of the code, please e-mail me at my mil address (chris.sanders.ctr@nsoc.med.osd.mil) from your mil/gov address and I will get it over to you. We are hoping to get all of that bundled up by next week.

Presentations I Attended

Keynote Panel Discussion – “Unplug to Save”

I started the week on Tuesday by attending the opening ceremony in which there was a panel discussion between several leaders in the government cyber defense community. The panel included Winn Schwartau, Mark Bengel, Doris Gardner, John Linkous, and John Pray, Jr and was moderated by Bobbie Stempfley. If you aren’t familiar with those individuals I’ll leave the Googling to you :).

The discussion was centered on the concept of “unplug to save”, focusing on whether it was an acceptable solution to unplug an entity from the Internet in order to prevent a catastrophic event from occurring as a result of a cyber attack. The panel was split and brought up several good points about the interdepencies between certain aspects of government and national defense, namely citing the one that were unknown. Truth be told, sometimes we just don’t know the affect removing certain networks from the Internet would have. I’m of the opinion that in some cases hitting the kill switch is the best policy, but that is only in an extreme and I’m not sure who that authority should be put on. The panel also got into a discussion of the inherently flawed nature of the Internet and the need for an architecture redesign. That was all fine and dandy and I won’t disagree…but until some form of governing body takes on the task of redesigning the fundamental protocols of the Internet and it is taken seriously then this is just a pie in the sky dream.

The only thing that really irked me during the discussion was when one of the panelist mentioned how we could “solve the cyber problem” by hiring the types of hackers who can’t get clearances. It would seem to be that doing such a thing would be a prime way to generate more Bradley Manning-esque cases. Granted, Manning wasn’t a computer security expert by any means, but imagine what someone with his kind of access could do with a bit of hacking knowledge. I’d just asoon we make cyber jobs within the government more attractive to young professionals so that they stay on the straight and narrow instead of the USG resorting to hiring criminals.

Internet Blockades

This talk was presented by Dr. Earl Zmijewski from Renesys and was one of the talks I enjoyed the most. He described several types of Internet censoring, blocking, and filtering techniques used across the world citing recent examples of Egypy, Libya, North Korea, and of course, the great firewall of China. All of his examples had technical data to back them up which really left me with satisfied. Random fact – N. Korea only has 768 public IP addresses.

This talk was centered on the creation of metadata of layer 7 data on the network. This isn’t entirely a new concept, but its one that most people are just now keying in on. The general idea is that you can strip out only the layer 7 data from HTTP/DNS/EMail streams, index it, and store it so that you can perform analysis on it. The benefit here is that the amount of disk space required for storage of this type of data is much less than storing full PCAP, allowing for more long term analytics. The talk was presented by David Cavuto from Narus, who did describe a few useful analytics I hadn’t though of. For example, collecting the length of HTTP request URIs and performing a standard deviation of those to look for outliers. This could potentially find incredibly long or incredibly short URIs that might be generated by malicious code.

Unfortuantely, being a vendor talk, Mr. Cavuto didn’t provide anything that would help people generate layer 7 metadata, but he did have a product he was selling that would do it. Fortunately, I have some code that will generate this type of metadata from PCAP. I’m going to button that up and release it here at some point…for free 🙂

This was, by far, my favorite presentaiton of the week. It was given by Eddie Schwartz, the new CSO at RSA. The talk was centered around investing time in the right areas of analysis. Namely, looking across the data sources that matter and not relying on the IDS to do all the work. Once Mr. Schwartz releases his slides I would recommend checking them out. He is a man who understands intrusion detection and how to make it effective. My favorite part of his talk was something he said a couple of times: Yes, doing it this way is hard. Suck it up. It gets easier.

They Are In Your Network, Now What?

This talk was presented by Joel Esler of Sourcefire. Joel is a really smart guy and a great presenter and he didn’t disappoint. My big take away from this one was his discussion of Razorback, which I really think is going to be one of the next big things in intrusion detection. I think a lot of the crowd missed the point on this. There were a lot of complaints because of the amount of legwork required to integrate the tool, but I think most of those people were overlooking the early stage the tool was in and the potential impact of the community released nuggets and detection plugins. I played with Razorback when it was first released and look forward to digging into it again once some of the setup and configuration pains are eased. I’ve already thought of quite a few nuggets that I could possibly write for it.

Analysis Pipeline: Real-time Flow Processing

I’m a huge fan of SiLK for netflow collection and analysis so I was excited to hear Daniel Ruef from CERT|SEI talk about Analysis Pipeline, a component that adds some cool flexibility to SiLK. Overall, I was really impressed with the capability and am looking forward to playing with the next version when it comes out in a couple of months. I always say that if you aren’t collecting netflow you are missing out on some great data, and SiLK is a great way to start collecting and parsing netflow for free. If you are already using SiLK, please do yourself a favor and look into the free add-on Analysis Pipeline.

Advanced Command and Control Channels

I thought this was an awesome overview of traditional and more advanced C2 channels that malware use. I don’t think anything here was really new, but the way the presentation was broken down was very intuitive and the examples that were given were rock solid. This was given by Neal Keating, a cyber intel analyst with the Department of State.

Final Thoughts

I really enjoyed the conference and honestly consider it one of the best and most relevant conferences for folks in cyber security within the gov/mil space. My only major complaint was that a few vendors managed to sneak their way into speaking and basically giving product sales pitches rather than technical talks. I’m hoping that feedback will make it back to the US-CERT folks and more effort will go into preventing that from happening in the future. I hate showing up to a talk that I hope to learn something from and being drilled with sales junk about products I don’t want. Yes, I’m looking at you General Dynamics and Netezza.

Overall, the staff did a great job of organizing and I’d be happy to have the opportunity to attend and speak at GFIRST 2012 in Atlanta next year.

I had the opportunity to take the SANS FOR610: Reverse Engineering Malware course in Orlando a couple of weeks ago and I wanted to write about my experience with the course. It’s no secret that I’m a big proponent of SANS. I’ve taken SEC 503 and SEC 504 at live events and I also mentor both courses here locally in Charleston. I wanted to take FOR610 as my next course because malware analysis is something I’ve not done a significant amount of. I’ve done a fair amount of behavioral analysis but very little code analysis at the assembly level and the course syllabus appeared to be heavy on that subject so it seemed like a natural fit to help fill in some of my knowledge gaps.

Instructor

The course in Orlando was taught by Lenny Zeltser. Lenny is the primary author of the materials, and he also runs a great blog over at http://blog.zeltser.com/ that I’ve followed for quite some time. I’ve been to a lot of different training courses and have also provided courses myself so I’ve seen plenty of bad instructors and good instructors. One of the things I find most challenging when teaching is taking highly complex subject matter and breaking it down in such a way that it is understandable. Being able to do this effectively is one of my primary criteria for defining a good instructor. That said, Lenny is perhaps one of the best teachers I’ve had. He took all of the highly complex concepts and broke them down in such a way that they were understandable at some level for every one in the class. He provided clear guidance and assistance during the lab portions of the class and I don’t remember a single question that was asked that he didn’t have an immediate answer for. His depth of knowledge on the subject was very apparent and appreciated.

Difficulty

The course really has two distinct sides to it: behavioral analysis and code analysis. Depending on your background, you may find this course very difficult at times and easier at others. I have written several programs in languages including Python, PHP, and C as a function of my primary job role, so I understand programming concepts, but I’m not a professional programmer by any stretch. That being the case, I had a harder time with the code analysis portions of the course. If I didn’t have any programming experience, I think I would have been completely lost on more than a few occasions. On the other side of the coin, I had no problems whatsoever with the behavioral analysis instruction and labs, but I could tell that several other people in the class did. From what I gathered by talking to people and looking at name badges, roughly 65-85% of the folks in my class were programmers of some sort. The course is touted as not requiring any previous programming experience, but I think to get the full benefit from the class, you should at least be familiar with core programming concepts, preferably in an object oriented language.

Course Content

The course was 5 days long and covered a variety of topics. I’ve outline some of those here along with the new skills I gained or enhanced as a result of what we learned.

Day 1

The first half of the first day was devoted to the setup of the virtual malware analysis lab used in the course. This is done in such a way so that the virtual lab can be used after you leave the class to do real world malware analysis in your organization using the virtual infrastructure. The second half of day one focused on using the lab for behavioral analysis.

New Skills I Gained: Knowledge of new malware analysis tools.

Day 2

This day built upon our knowledge of behavioral analysis and introduced new concepts related to that. We were introduced to dissecting packed executables and Javascript and Flash malware.

New Skills I Gained: Automated unpacking of packed files. Tools for dissection and extraction of malicious code in Flash objects.

Day 3

This day was devoted to code analysis. We were introduced to assembly and spent a great deal of time looking at commonly identifiable assembly patterns used in malware. This was one of the most useful parts of the class for me. We also looked a bit at anti-disassembling techniques that malware authors use.

New Skills I Gained: Enhanced understanding of assembly. A plethora of anomalies to look for in assembly level code analysis of malware. Patching code at the assembly level to get a desired outcome.

Day 4

The fourth day focused on analysis of malware that was designed to prevent itself from being analyzed. We looked at packers and learned how to manually step through malware code to unpack it for analysis. The day ended with an detailed and highly valuable look into deobfuscating malware in browser scripts.

New Skills I Gained: Detailed understanding of assembly for malware analysis. Manual extraction of unpacked code from packed executables.

Day 5

The final day of the course was another one of the most useful parts of the course for me. This first half of this day focused on analysis of malicious Microsoft Office files and malicious PDFs. After lunch, we covered shellcode analysis and memory analysis.

New Skills I Gained: Tools and procedures for extracting malicious code from MS Office files and PDFs. Better understanding of PDF file structure. Extraction of malware running in memory.

Labs

The labs were an integral part of the course. In the labs we analyzed real malware samples in our virtual analysis lab. I’m incredibly happy that we looked at REAL code from REAL attackers rather than simple malware created in a lab for the purpose of the course. Doing things this way we got to see how attackers will often take shortcuts or write bad code that we have to sort through rather than just dissecting cookie cutter malware with no imperfections. The labs served their purpose, helping reinforce new concepts in a practical manner. During the course, everyone had their laptops open and two virtual machines running at all times as we would dive into them for exercises very frequently.

Although I was very pleased with the labs in some ways, I am critical of them for a few other reasons. Prior to the class, you are provided some instructions on how to setup a single Windows based VM that is destined to be infected with malware repeatedly throughout the class. In addition, the instructions said we would be given a version of Remnux, the reverse engineering malware Linux distribution created by Lenny, to use during the class when we got there. I got this all up and running without any problems, but I was pretty upset when I got to the class to find out that there was quite a bit more setup to do. As a matter of fact, almost the entire first half of the first day of instruction was taken up by additional lab configuration. We were given a CD that contained a variety of tools that were to be installed on our Windows VM. I think all in all, we had to install about 25 different tools. Several people asked why these weren’t provided prior to the class and we were told it was so that we would take more ownership over our malware analysis labs and could ask questions. Although I can respect the comments in support of this, I think providing these tools prior to the class along with the other instructions would allow for better use of time. At lunch the first day I felt a bit cheated as my company had paid for an expensive course where I was just sitting around installing software. Providing this software prior to the course and having people come prepared would have allowed for a whole half day of additional instruction which would have been incredibly valuable.

The other primary issue I had with the labs was the format in which they were laid out. In most of the labs, Lenny would teach us a concept and then step through the process on his own system. Then he would turn us loose on our systems to work on the same example he just walked through. Although somewhat helpful, it wasn’t entirely effective since we had just seen him do the same example we were working through. I would contrast this with the lab format in the SEC 503: Intrusion Detection In-Depth course. In that course, students are given a workbook with lab exercises. The instructor there would teach a concept, go through a lab on screen, and then turn students to the workbook and give them some time to work through similar, but different examples. This format provided a great deal more value because we had to do quite a bit more thinking to get through the examples on our own, rather than just recreating what the instructor did.

Summing It Up

Overall, my experience with FOR 610 was very valuable and I’m thrilled I got the chance to take the course. I walked away with a lot of new skills and am able to provide a lot of value to my organization as a result. I now feel completely comfortable performing code analysis of malicious binaries. I also learned more assembly than I ever thought I would and feel like I could even write some simple programs in assembly should I choose to punish myself in that manner. I also gained a greater understanding of lower level operating system components which will prove useful in several cases. Make no mistake, this is a very difficult course, which is why ways numbered it so high. It is the highest level forensics course they teach, and it will challenge you. However, if you are up to it, there is a lot to be learned here, and I have no doubt that it is the best malware analysis course you will find.

One of the more important skills in intrusion detection and analysis is the ability to evaluate an IP address or domain name in order to build an intelligence profile on that host. Gathering this intelligence can help guide you to making more informed decisions regarding the remote hosts that are communicating with your network in order to determine if they are of a malicious or hostile nature. I recently wrote a two-part article on collecting threat intelligence for WindowsSecurity.com which describe some methods that can be used to collect threat intelligence on a host or network.

Stay Updated!

I use my mailing list to send out exclusive content, training discounts, and it's the best way to stay up to date on new classes I conduct on topics like network security monitoring, packet analysis, technical writing, and more.

* indicates required

Email Address *

First Name

Last Name

Applied Network Security Monitoring

Applied Network Security Monitoring is the essential guide to becoming an NSM analyst from the ground up. This book takes a fundamental approach, complete with real-world examples that teach you the key concepts of NSM.

Practical Packet Analysis

It's easy to capture packets with Wireshark, the world's most popular network sniffer, whether off the wire or from the air. But how do you use those packets to understand what's happening on your network? This extensively revised second edition of the best-selling Practical Packet Analysis will teach you how to make sense of your PCAP data.

100% of the author royalties for sales of Practical Packet Analysis go to support the Rural Technology Fund

Rural Technology Fund

Established in 2008, the Rural Technology Fund (RTF) seeks to reduce the digital divide between rural communities and their more urban and suburban counterparts. This is done through targeted scholarship programs, community involvement, and the general promotion and advocacy of technology in rural areas.