Today we’re going to analyze yet another sample of PDF containing an XDP form. The difference between this sample and the one of my previous post is that this one will be less about JavaScript deobfuscation and more about anti-analysis tricks.

If we were to open the file directly from the file-system, we would be prompted to choose the correct file format:

But as such is not the case, we simply go to the decompressed stream in the Zip archive (or to the CFBF document, it doesn’t matter), position the cursor to the start of the file and press Ctrl+E.

We select the PDF format and then open the newly created embedded file in the hierarchy.

What we’ll notice by looking at the summary is that a stream failed to decompress, because it hit the memory limit. A tool-tip informs us that we can tweak this limit from the settings. So let’s click on “Go to report” in the tool-bar.

This will bring us to the main window. From there we can go to the settings and increase the limit.

In our case, 100 MBs are enough, since the stream which failed to decompress is approximately 90 MBs. Let’s click on “Save settings”, click on “Computer Scan” and then back to our file.

Let’s now repeat the procedure to load the embedded file as PDF and this time we won’t get the warning:

Just for the sake of cleanliness, we can also select the mistakenly identified CFBF embedded file and press “Delete”, in order to remove it from the analysis.

We are informed by the summary that the PDF contains an interactive form and, in fact, we can already see the XDP as child of the PDF.

We could directly proceed with the analysis of the XFA, but let’s just step back a second to analyze a trick this malware uses to break automatic analysis. The XFA is contained in the object 1.0 of the PDF.

Let’s go with the cursor to the stream part of the object (the one in turquoise), then let’s open the context menu and click on “Ranges->Select continuous range” (alternatively Ctrl+Alt+A). This will select the stream data of the object. Let’s now press Ctrl+T to invoke the filters and apply the unpack/zlib filter. If we now click on “Preview”, we’ll notice that an error is reported.

The stream is still decompressed, but it also reports an error. This is one of the trick this malware uses to break automatic analysis: the ZLib stream is corrupted at the very end.

Let’s now open the XFA. Immediately we can see another simple trick to fool identification of the XDP: a newline byte at the start.

The first part contains the information needed to construct ROP for the various versions of Adobe Reader. In the last part we can see that the JavaScript code sprays the heap. So probably they rely on a huge image embedded in the XDP (which is actually the reason why the XDP is so big) to trigger the exploit.

Let’s go back to the shellcode part and let’s analyze that. I’m talking about the part of code which starts with:

var shellcode ="\u9090\u9090\u9090\u9090\u9090\u9090...

We could of course copy that part of a text view, remove the \u, then convert to bytes and then apply a filter to reorder them, as in JavaScript the words are in big-endian. But we can do it even more elegantly and make our shellcode appears as an embedded file. So let’s select the byte array from the hex editor:

Let’s now press Ctrl+E and click on the “Filters” button.

What we want to do is to first remove the “\u” escape. So we add the filter misc/replace and specify “\u” as in and nothing as out (we leave ascii mode as default). Now we have stripped the data from the escape characters. Now we need to convert it from ascii hex to bytes. So we add the convert/from_hex filter. The last step, as already mentioned, is that we need to switch the byte order in the words. To do that, we’ll use the lua/custom filter. I only modified slightly the default script:

You can also download the Profiler project with the complete analysis already performed (same password: infected29A). Please notice, you’ll be prompted twice for the password: once for the project and once for the Zip archive.

Today I will go through with you on how we can make Profiler work extra hard for us. I will try to fill in required information about where to look out for information and how decode some of the information.

For those who want to follow along, this is a link to the .apk file. Do note, this is a MALICIOUS file, so please do the analysis in a “safe” environment. The password to the attachment is “infected29A”

Now, let’s start getting our hands dirty…and open the suspicious .apk file. Firstly, we are going to go through the source code and find out what is the important information that we can extract out.

One of the things that malware analyst are interested in is the “Command & Control” of the malware.

As the source code of the malware was made public, we can see where the IP address for the C&C is stored in my/app/client/ProcessCommand.java as shown in the image below.

As Profiler provides SDK for us to analyse DEX and extract relevant Dalvik code, we will be making use of that today by creating “Actions”. In order to make an action out of it, go to “Extensions” in the main window, then select the Actions tab and click on “Open user plugin directory” as shown below.

[ Scripting the C&C Extraction ]

Create the following file, “androrat.py” in there. The code for “androrat.py” is shown below:

The 2nd sample that we will be looking at is OmniRat. The detection rate for OmniRat is just moderate, 20/54 in VT.

The actual piece of malicious code is actually Base64 encoded and hidden away in the resources.asrc file as you can see in the image below

From the image below, we could extract the APK and even inspect it on the fly.
So select the the Base64 encoded string in the resources.asrc file and then press Ctrl+E and click on the filters button on the bottom right.

Even though sending malware via zipped attachments in spam emails is nothing new and had been around for eons but many people are still puzzled at how it works. Thus, I will go through with you on how to do it with Profiler. I will try to fill in required information about where to look out for information and how decode some of the information.

Firstly, we are going to learn how are a bit about the .msg file format and how is it used to store a message object in a .msg file, which then can be shared between clients or message stores that use the file system.

From an investigator’s point of view, you should always analyze the .msg file without installing Outlook. In order to analyze the .msg file without Outlook, we can read more about the file format from:

[ Part 1 : Getting Started ]
For those who want to follow along, this is a link to the .msg file. Do note, this is a MALICIOUS file, so please do the analysis in a “safe” environment. The password to the attachment is “infected29A”

The msg file is already flagged by Profiler, as it contains some suspicious features.
Each “__substg” contains valuable pieces of information. The first four of the eight digits at the end tells you what kind of information it is (Property). The last four digits tells you the type (binary, ascii, Unicode, etc)

0x007d: Message header

0x0C1A: Sender name

0x0C1F: Sender email

0x0E1D: Subject (normalized)

0x1000: Message body

[ Part 2 : Email investigation ]
If we are interested in email investigation, let’s check out the following file, “__substg1.0_0C1F001F”.

As we can see below, the sender’s email address is “QuinnMuriel64997@haarboutique-np.nl”
But is it really sent from Netherlands?

Well, let’s check out the message header located in “__substg1.0_007D001F” to verify that.

If we were to do through the message header, do a whois on “haarboutique-np.nl” and check out the MX server. We can confirm that the sender is spoofing email as well.

From the message header, we can conclude that the sender sent the email from “115.78.135.85” as shown in the image and the extracted message header as shown below.

Received: from [115.78.135.85] ([115.78.135.85])

by mta02.dkim.jp (8.14.4/8.13.8) with ESMTP id u44L8X41032666

for <info@dkim.jp>; Thu, 5 May 2016 06:08:35 +0900

Whois information showed that IP address where this spam email is sent from is from Vietnam.
But it doesn’t mean that the attacker is from Vietnam. Anyone in the world can buy web hosting services in Vietnam. This is just to let you know that the attacker is definitely not sending from “haarboutique-np.nl”

[ Part 3 : Email investigation ]
Using this information opening the “__substg1.0_0E1D001F” file and we can see the subject, “Re:“

Please find attached document you requested. The attached file is your account balance and transactions history.

Regards,
Muriel Quinn”

Awesome, Muriel Quinn is sending me my account balance and transactions history which I may or may not have requested at all. Awesome, he is also attaching the files to the email just for me. This is definitely suspicious to me.

[ Part 4 : Email attachment ]
Now that we are interested in the attachments, let’s look at “Root Entry/__attach_version1.0_#00000000” and refer to the specifications again.

//Attachments (37xx):

0x3701: Attachment data

0x3703: Attach extension

0x3704: Attach filename

0x3707: Attach long filenm

0x370E: Attach mime tag

If we were to look at “__substg1.0_3704001F”, we will see that the filename of the attachment is called “transa~1.zip” and the display name “__substg1.0_3001001F” of the attachment is called “transactions-625.zip”.

Now let’s look at the actual data located within “__substg1.0_37010102” as shown below.

Now, let’s press “Ctrl+A” to select the entire contents. Then copy it into a new file as shown in the image below.

But as we can see on the left, Profiler can identify what is inside the attachment. There are 3 Javascript files inside the .zip file.

Now let’s fire up “New Text View” and copy the contents of “transactions 774219.js” as shown below.

Press “Ctrl+R” and select “Beautify JavaScript” and Profiler will “JSBeautify” it for you. But let’s add some “Colouring” to it by doing “Right-click -> Language -> JavaScript” as shown below.

We can use Profiler to debug the JavaScript but I shall leave that as an exercise for the readers.
The decoded JavaScript will look something like this.

As we can see from the image above, it is downloading from “http://infograffo[.]com[.]br/lkdd9ikfds” and saving it as “ew3FbUdAB.exe” in the victims’ TEMP directory.

We won’t be going through on reversing the malware.

In the meantime, we hope you enjoyed reading this and would be happy to receive your feedback!

Recently version 2.6 of Profiler has been released and among the improvements support for XDP has been introduced. For those of you who are unfamiliar with XPD, here’s the Wikipedia description:

“XML Data Package (XDP) is an XML file format created by Adobe Systems in 2003. It is intended to be an XML-based companion to PDF. It allows PDF content and/or Adobe XML Forms Architecture (XFA) resources to be packaged within an XML container.

XDP is XML 1.0 compliant. The XDP may be a standalone document or it may in turn be carried inside a PDF document.

XDP provides a mechanism for packaging form components within a surrounding XML container. An XDP can also package a PDF file, along with XML form and template data. When the XFA (XML Forms Architecture) grammars used for an XFA form are moved from one application to another, they must be packaged as an XML Data Package.”

So I’ll use the occasion to show the reversing of a nice PDF with all the goodies. Let’s open the suspicious PDF.

The PDF is already heavily flagged by Profiler, as it contains many suspicious features.

If we take a look, just out of curiosity, at the object 8 of the PDF we will notice that the XDP data contains a bogus endstream keyword to fool the parsers of security solutions.

Profiler handles this correctly, so we don’t have to do anything, just worth mentioning.

Let’s take a look at the raw XDP data.

As you can see, it is completely unreadable because of the XML escaped characters. Even this is not really important for us, since the XML parser of Profiler handles this automatically, again just worth mentioning.

So let’s open directly the embedded XDP child and we can see a readable and nicely indented XML.

We can see that the XML contains JavaScript code, but Profiler already warns us of this. So let’s just click on the warning.

The code isn’t readable. So let’s select the JavaScript portion and then press Ctrl+R->Beautify JavaScript.

Much better, isn’t it?

The code is quite easy to understand although it’s obfuscated. It takes a value straight from the XDP, processes it and then calls eval on it.

This is the value it takes:

What we want is the result of the processing, before eval is called. So what I did is to modify slightly the JavaScript code like this:

What it does is basically to spray the heap using an array. It changes the payload based on the version of Adobe Reader. The version is retrieved by calling the _l5 function.

Now we could just examine the _l1 or _l2 payloads directly, but just to make sure I let the code generate a spray portion. So I changed the code accordingly and avoided to actually spray a lot of data.

We can run this script in the JavaScript debugger (Ctrl+R->Debug JavaScript).

The final print will give us the payload in memory. We can copy the just the initial part, avoiding the padding. Let’s paste the string into a text editor in Profiler and then Ctrl+R->Hex string to bytes.

If we look at the payload, we can see that the beginning (the marked portion) looks like ROP code. So in order to avoid looking for the gadgets in memory, let’s skip the ROP as it most likely is only going to jump to the actual shellcode. Let’s assume that is the case and thus focus on the data which follows.

We can see a web address at the end of the data. So we could just assume that the shellcode downloads an executable and runs it. But just for the sake of completeness, let’s analyze it.

We can of course disassemble the shellcode by applying a filter to it (Ctrl+T->x86 disasm). But what we’ll do is to use a debugger via Ctrl+R->Shellcode to execute. This way we can quickly step through what it does.

Yep. That’s an icon. In an executable. In a process address space. In a raw memory dump.

And here is the video demonstration:

This is just a proof-of-concept. We still haven’t decided whether to develop this further. It really depends on whether the forensic community is interested in having such a product. So, even a re-tweet will have an impact on our decision. :) We wanted to show what is currently possible and, of course, it’s not the end of the cool things which are possible.

In case we decide to go ahead with the development, we will probably create a beta-test group of potential customers and decide with them a roadmap for a 1.0 version, taking into consideration all those features which are essential to them. What we already support are the Windows versions that go from XP to 10 on the following architectures: x86, x86-PAE, x64. And, of course, the software itself, just like Profiler, runs on Windows, OS X and Linux.

And now to the more technical side if you’re interested. What we have shown in this demonstration is just a Python extension for Profiler. To be more specific, it’s only about 1000 lines of Python code and this includes all the UI views. The bulk of the work went into exposing all the necessary capabilities of our SDK to Python. Of course, all this work also benefits other extensions, not just the memory forensics ones. So, if this project ends with this post, it’s really not a tragedy, as we haven’t lost any significant time developing specific stuff for it.

So why did we choose to write our memory forensics support in Python, rather than in C++, which would’ve taken us a lot less time? The reasons are several. The memory forensic field is always changing rapidly and setting code in stone by compiling it wouldn’t be a good idea. Also, we wanted to give our customers the possibility to inspect the code and to modify it. Just by looking at existing code it’s extremely easy to write new utilities. While on the other hand, having our core engine and UI written in C++, makes our tool very fast. We think this is the perfect combination.

If you’re wondering why we didn’t use Volatility as a backbone, the answer is that it would’ve been incompatible on a licensing level and way too difficult to fit nicely into our existing framework to accomplish what we wanted to do.

We hope you enjoyed the demo and we would be happy to receive your feedback!

Of course, it doesn’t have to be a custom view, it can even be a simple text view or a hex view, although usually custom views make more sense for a dialog, as you’ll probably want to show some standard buttons like “Ok” and “Cancel” at the bottom.

Capstone bindings

While Capstone has been part of Profiler for quite some time now, now it’s possible to directly call its official Python bindings. The module can be found under ‘Pro.capstone’ and can be imported easily to be made working with existing code:

import Pro.capstoneas capstone

Failed allocation security issue

In the Qt framework memory allocations fail silently, at least in the release version. We didn’t notice it, because in the debug version they would at least throw an exception preventing further execution. Since in release the execution wouldn’t be stopped, it was in some cases possible to trigger a failed allocation and then make the program use memory it didn’t own (so basically a buffer overflow). This problem has now been fixed.

Credit goes to the Insid3Code Team for having found and reported the issue.

Following our recent introduction to Scan Providers, here’s a first implementation example. In this post we’ll see how to add support for Torrent files in Profiler. Of course, the implementation shown in this post will be available in the upcoming 2.5.0 release.

We also warn the user if the file exceeds the allowed maximum. We perform the whole scan logic in the UI thread, since we’re not doing any CPU intensive operation and thus we return SCAN_RESULT_FINISHED, which causes the _threadScan method not be called.

When retrieving data from the dictionary, we also make sure that it is in the correct type, so that the code which handles this data won’t end up generating an exception when trying to process an unexpected type.

We could still extract more information from the torrent file. For instance, we could show the list of hashes and to which portion of which file they belong to. If that’s interesting for forensic purposes, we can easily add this view in the future.

Version 2.5.0 is close to being released and comes with the last type of extension exposed to Python: scan providers. Scan providers extensions are not only the most complex type of extensions, but also the most powerful ones as they allow to add support for new file formats entirely from Python!

This feature required exposing a lot more of the SDK to Python and can’t be completely discussed in one post. This post is going to introduce the topic, while future posts will show real life examples.

The name of the section has two purposes: it specifies the name of the format being supported (in this case ‘TEST’) and also the name of the extension, which automatically is associated to that format (in this case ‘.test’, case insensitive). The hard limit for format names is 9 characters for now, this may change in the future if more are needed. The label is the description. The ext parameter is optional and specifies additional extensions to be associated to the format. group specifies the type of file which is being supported; available groups are: img, video, audio, doc, font, exe, manexe, arch, db, sys, cert, script. file specifies the Python source file and allocator the function which returns a new instance of the scan provider class.

Let’s start with the allocator:

def allocator():
return TestScanProvider()

It just returns a new instance of TestScanProvider, which is a class dervided from ScanProvider:

_clear gives a chance to free internal resources when they’re no longer used. In Python this is not usually important as member objects will automatically be freed when their reference count reaches zero.

_getObject must return the internal instance of the object being parsed. This must return an instance of a CFFObject derived class.

_initObject creates the object instance and loads the data stream into it. In the sample above we assume it being successful. Otherwise, we would have to return SCAN_RESULT_ERROR. This method is not called by the main thread, so that it doesn’t block the UI during long parse operations.

This is a minimalistic implementation of a CFFObject derived class. Usually it should contain at least an override of the CustomLoad method, which gives the opportunity to fail when the data stream is first loaded through the Load method. SetDefaultEndianness wouldn’t even be necessary, as every object defaults to little endian by default. SetObjectFormatName, on the other hand, is very important, as it sets the internal format name of the object.

The code above will issue a single warning concerning native code. When _startScan returns SCAN_RESULT_OK, _threadScan will be called from a thread other than the main UI one. The logic behind this is that _startScan is actually called from the main thread and if the scan of the file doesn’t require complex operations, like in the case above, then the method could return SCAN_RESULT_FINISHED and then _threadScan won’t be called at all. During a threaded scan, an abort by the user can be detected via the isAborted method.

From the UI side point of view, when a scan entry is clicked in summary, the scan provider is supposed to return UI information.

This will display a text field with a predefined content when the user clicks the scan entry in the summary. This is fairly easy, but what happens when we have several entries of the same type and need to differentiate between them? There’s where the data member of ScanEntryData plays a role, this is a string which will be included in the report xml and passed again back to _scanViewData as an xml node.

For instance:

e.data="<o>1234</o>"

Becomes this in the final XML report:

<d><o>1234</o></d>

The dnode argument of _scanViewData points to the ‘d’ node and its first child will be the ‘o’ node we passed. the xml argument represents an instance of the NTXml class, which can be used to retrieve the children of the dnode.

But this is only half of the story: some of the scan entries may represent embedded files (category SEC_File), in which case the _scanViewData method must return the data representing the file.

Apart from scan entries, we may also want the user to explore the format of the file. To do that we must return a tree representing the structure of our file:

If you have noticed from the screen-shot above, the analysed file is called ‘a.t’ and as such doesn’t automatically associate to our ‘test’ format. So how does it associate anyway?

Clearly Profiler doesn’t rely on extensions alone to identify the format of a file. For external scan providers a signature mechanism based on YARA has been introduced. In the config directory of the user, you can create a file named ‘yara.plain’ and insert your identification rules in it, e.g.:

rule test
{
strings:
$sig = "test"
condition:
$sig at 0
}

This rule will identify the format as ‘test’ if the first 4 bytes of the file match the string ‘test’: the name of the rule identifies the format.

The file ‘yara.plain’ will be compiled to the binary ‘yara.rules’ file at the first run. In order to refresh ‘yara.rules’, you must delete it.

One important thing to remember is that a rule isn’t matched against an entire file, but only against the first 512 bytes.

Of course, our provider behaves 100% like all other providers and can be used to load embedded files:

Our new provider is used automatically when an embedded file is identified as matching our format.

Navigation

Follow Us

Subscribe to our Newsletter!

Our Newsletter is primarily directed to users of the Profiler who want to learn more about the product, use cases, discover tips & tricks, be kept up-to-date with future improvements and participate to polls.