Subscribe

SmokeLoader is a quite old but still very popular bot generally used to drop other malware families or deploy additional modules that implement some nice features.

The other day I was checking a sample from a recent campaign, and as I was stepping through the loader I found some interesting stuff I hadn't seen before. In the latest releases of the 2018 version of SmokeLoader they have implemented some anti-debugging checks, as well as anti-vm, anti-disassembly and anti-analysis in general.

CERT Polska did a great job describing most of them. Some are really neat, like the one that calculates the address of the next instructions based on the values of BeingDebugged and NtGlobalFlag.

However, the anti-hooking check is not described in the blog post (maybe it wasn't present yet) and it's actually preventing SmokeLoader from detonating on Cuckoo Sandbox and possibly others.

The assembly snippet of this check is the following:

I've commented the assembly to make it easier to read.

Basically the bot has a list of Windows functions that it's checking for userland hooks. To do this, it compares the first bytes of each function with hardcoded assembly byte patterns commonly used when hooking functions in userland.

We can see in the Cuckoo's monitor source code that the opcodes used for hooking arethesame.

TL;DR: Technical low-level analysis of the cheat, also including the licensing and differences between public and private version.

CS:GO is one of the most popular competitive online games, it has 520.285 current players as I write these lines. As in any other competition-driven game, cheaters arise, and specially in the CS community, they have become a serious problem.

Today we are taking a look at the public and private version of a cheat for this game!

I won't mention the name of the cheat to avoid giving them free advertisement and because it's not necessary for this post, but if you're into this topic, you'll probably guess.

Before we start, it's important to mention that I managed to get a private version build using an alternative channel 😈. This means I've never paid to the developer, so I didn't support their business in any way! Damn you, cheaters!

Public vs Private version

This cheat is quite accessible, as the developer provides a public (free) version with all the capabilities for the users to try. The most important "downside", is that the public cheat is obviously detected by VAC, so if you use it in a VAC-protected server, it's a matter of time that your account gets VAC-banned.

Here is where the paid private version comes into play: Customers get a unique build that is guaranteed to be undetected.

Licensing

Each private version build of the cheat is tied to a machine, to avoid piracy, reselling, ...

The license procedure gets the SystemDrive environment variable, and using DeviceIoControl with the parameter IOCTL_DISK_GET_DRIVE_GEOMETRY, reads the technical capabilities of the hard drive. Then the Processor Brand String is also read using the cpuid instruction.

This information is formatted into a string, hashed with SHA1, and mutated with a custom ASCII rotation algorithm:

The resulting string is your unique license, which is sent to the cheat developer when you buy it, and in return you get a build that only works in the computer that generated this license.

How the cheat works

This cheat is an external cheat, which means all the work is done out of the CS:GO process (no DLL injection).

The first thing it does is open the csgo.exe process, and get the base addresses of client.dll and engine.dll.

Then it uses patterns to find game structures (offsets) in the memory, these patterns usually match opcodes of the game binaries, where memory pointers are referenced, or other useful information. They also use patterns to find game functions and strings.

For example, one of the patterns is:

89 0D ? ? ? ? 8B 0D ? ? ? ? 8B F2 8B C1 83 CE 08

If we look for these bytes in the client.dll file, we get the following hit:

If the cheat wants to run an in-game console command, it can allocate memory in the game process, pass the arguments to the function using this memory, and create a new thread using CreateRemoteThread at the beginning of the procedure.

When the cheat has located all it needs to work, it will start a bunch of threads that implement each of the functionalities. These threads are in charge of monitoring and manipulate the game memory using the functions ReadProcessMemory and WriteProcessMemory.

Changing the values of the internal game structures at will, the cheat can achieve the functionalities it offers.

I have identified some of the functions and renamed them in my pseudocode:

Private version protection

The public version is poorly protected, they just encrypted the strings with a simple algorithm but it has no code obfuscation or PE packing.

On the other side, the private version is protected with Themida, a commercial packer that, depending on its configuration, can be quite effective protecting executables.

It's very likely that they use Themida for two purposes:

Protect the cheat license from being patched. The program can be manipulated to validate any license when running in a computer, but reconstruct a fully working version of the packed executable and patch it may be quite tricky.

The second and most important, avoid the VAC signatures from detecting their cheat when running. Themida can protect the original opcodes of the program when it's loaded in memory and running, and writing signatures (patterns) for those opcodes is one of the methods VAC uses to detect cheaters.

Closing

If we compare it to other cheats, this one is simple in terms of functionality, but still quite effective.

Bear in mind that the CSGO binaries used for the analysis are not from the latest game update, as I wrote this one week ago. The binaries I used are:

This means that the cheat signatures may have been slightly modified to work with the new executables, and the offsets probably won't be the same if these binaries changed in the latest version of the game.

I got this camera not long ago, and as it usually happens, in addition to its main purpose it served some hours of fun!

It's one of the cheapest wireless IP cameras right now, you can find it for around 40$ depending on the store. The manufacturer is Sricctv, a company based in Shenzhen specialized in CCTV.

It uses Linux 2.6 and has a MIPS processor (MIPS 24K V4.12).

Firmware

I couldn't find the firmware in the official website and they didn't agree to send me the latest version. Luckily for me I got a firmware for a camera similar to mine so I could study the system a bit without messing with the hardware.

The firmware file format is pretty straightforward. It expects a 32 bytes header string, the size of the package in a 4 byte value, a ZIP file with the contents and a 32 bytes footer:

There are two types of upgrades handled by the upgrader, system upgrades and web app upgrades.

- System upgrades overwrite the main system binaries, located in /system/system/ and have this header and footer combination: wifi-camera-sys-qetyipadgjlzcbmn, wifi-camera-end-nvxkhfsouteqzhpo.

- Web app upgrades overwrite the contents in /system/www/ and have this header and footer combination: wifi-camera-app-qazwsxedcrfvtgba, wifi-camera-end-yhnujmzaqxswcdef.

Interestingly, web app upgrades are expected to contain a password protected ZIP file, but system upgrades are not. As the upgrader is in the system firmware image, we can look at the binary and locate the hardcoded password.

Telnet access for everyone!

While there are several ports listening in the camera, the most interesting are probably 23 (telnet) and 81 (default http panel). When we extracted the firmware, we located a nice string:

root:LSiuY7pOmZG2s:0:0:Administrator:/:/bin/sh

The encrypted password for root is easy to crack: 123456

So now we can log in to the camera via telnet with the root account and get access to all the file system.

This can also be used to disclose all the configuration, including user and password of the admin account in the web panel:

$ telnet 192.168.1.111Trying 192.168.1.111...Connected to 192.168.1.111.Escape character is '^]'.

We can change the root password, but as the passwords file is in a volatile partition of the file system, the default will be set again after reboot.

Connect the camera and say Hi to the Internet!

This bothered me a lot when I connected the camera for the first time. If your router has UPnP enabled, which is very common in SOHO routers, the camera will use this protocol to open the external port (Internet facing) of your router and forward it to the port where the web management service is listening. By default this port is 81.

If you haven't setup your credentials yet, the camera is wide open to everyone. If a vulnerability is found in the service, no matter what your configuration is, the camera will be there for sneaky eyes.

This is probably a "convenience" for non-technical users to connect from external networks using the P2P app provided by the vendor. The camera will also get the external IP of your network connecting to www.ip138.com, so the app knows where to connect.

Conclusion

If you care about your privacy, this camera is not for you. I guess you get what you pay, the camera has good specifications and performance, but the software design is just horrible.

A new version of pafish has been recently released. It comes with a set of detections completely new for the project (read: not new techniques), which are based on CPUs information. To get this information, the code makes use of rdtsc and cpuid x86 instructions.

Here we are going to look at rdtsc instruction technique, and how it is used to detect VMs.

What is rdtsc?

Wikipedia's description is pretty straightforward [1]:

The Time Stamp Counter (TSC) is a 64-bit register present on all x86 processors since the Pentium. It counts the number of cycles since reset. The instruction RDTSC returns the TSC in EDX:EAX. In x86-64 mode, RDTSC also clears the higher 32 bits of RAX and RDX. Its opcode is 0F 31.

So it is a counter increased in each CPU cycle.

Well, it actually depends on the processor.

Initially, this value was used to count the actual internal processor clock cycles. It was meant for developers to measure how many cycles a routine takes to complete. It was good to measure performance.

In the latest Intel processor families, this counter increases at a constant rate, which is determined by the maximum frequency the processor can run at that boot. Maximum does not mean current, as power-saving measures can dynamically change the velocity of the processor. This means it is not good to measure performance anymore, because the processor frequency can change at runtime and ruin the metric. On the other hand, now it can be used to measure time.

This is explained much better in reference [2].

So, how is this used to detect VMs?

In a physical (host) system the counters subtraction of two consecutive rdtsc instructions will result in a very small amount of cycles.

On the other hand, doing the same in a virtualized (guest) system, the difference can be much bigger. This is caused by the overhead of actually run inside the virtual machine.

I wrote a small program to verify this behaviour, it will do the subtraction ten times with a sleeping period of time in between. You can get the source from here.

This is similar to what pafish does, the output in a physical machine looks like this:

Try to compile and run this code with different compiler optimizations if you want to have some fun ;)

This is the theory, but in practice it depends on the virtualization product, its configuration, and the number of cores assigned to the guest system.

For instance, VMware virtualizes the TSC by default. This can be disabled but it is not recommended, the TSC virtualization can also be tweaked in the configuration. Much more information about this in references [3] and [4].

There is also a substantial difference when the VM has two or more cores assigned. With one core, the differences are not that big, and it gets close to a physical processor although sometimes some peaks can happen. With two or more cores, the differences are much bigger and consistent.

I suspect the second behaviour is caused by CPU ready times, which is explained in references [5] and [6].

Have a look at the following example in VirtualBox:

One core assigned, note the peaks

Two cores assigned, the differences are large and consistent

So we can conclude two things.

The first one is, this method is not always reliable as it is heavily dependant on the processor and the virtualization product.

The second one is, if I were running a sandbox cluster, I would try to assign only one core to each guest machine. Not only because it would make this method a bit less reliable, but also for performance.

Our fabulous sandbox uses an emulator instead of a VM, should I care about this?

Well, generally speaking you should not care about this specific method then. Emulators replicate the whole machine hardware, including the CPU at the lowest level (binary translation), so it has its own TSC implementation, and the cycles usage for a routine should be similar to a physical CPU.

We can verify this running our testing program in QEMU:

QEMU is nice

I hope you enjoyed the post and this new pafish release, thanks to mlw.re members for helping me with the tests :)

Check out the references for more information on this topic and general understanding on how VMs / emulators work!