RE

Monday, February 23, 2015

Note: It is assumed that the reader has a solid grasp on paging and operating system basics.

Before we dig in, this post should not be construed as an attack on ESEA, anti-cheat software, or fair gaming in general. It is simply an analysis thereof, detailing what the ESEA driver does on your machine. Although analysis will make attack vectors clear and obvious, no code or detailed explanation of how to leverage these points will be given.

ESEA anti cheat has a long standing record of being difficult for cheat users and their developers to make it any significant amount of time without getting hit with a ban from their services. While ESEA contains other countermeasures then just SBD(signature based detection) alone, this writeup will focus on how they catch known cheat software in the wild, and not on lone wolf tampering.

When we look at other software based anti-cheat out there today, we see lots of obfuscation, window name checks, object/handle name checks, handle access checks, and internal or cross process virtual memory scans. In fact, we don't see much deviation from the same basic checks, some of which rely on poorly documented Windows APIs in order to carry out.

The other side of the coin we see not so much a "detect-and-react" scenario but an all out "prevention" scenario. As in, prevent the cheat software from even accessing game memory or game meta-data to carry out its job. These types of mechanisms are typically carried out with more invasive techniques, such as hooking system services, monitoring filesystem access, or making use of ObRegisterCallbacks, which was added after the advent of KPP(patchguard) so anti-virus vendors could restrict access to critical processes and threads, without having to hook system services.

In other cases it's just a mixture of both. Either way, the path to finding a known cheat signature, or preventing it from even being started, has a predictable and all too common set of events that will be followed in order to do so.

ESEA takes a much different, but yet much more effective approach. Perhaps at some point you downloaded some memory imaging software and you created a memory image file on your disk, dumping all contents of DRAM to one large file for forensics purposes, or maybe just because you were bored and wanted to poke through it?

This, in a sense, is close to what ESEA does, minus the the 8, 16 or 32gb file part.

Instead, ESEA maps physical memory pages frame by frame into a user space mapping and scans them at the byte granularity attempting to match signatures of known cheats. A driver is required to at the very least initiate this type of scan, or in some cases, to carry it out entirely from a kernel thread.

There are 4 ways using exported kernel functions to perform this type of scan. In fact there are really only 3, the 4th is just for the nitpickers who would do it in a totally unsupported and performance slaughtering way, we'll get to that.

1. The first way requires the following functions:

MmGetPhysicalMemoryRanges

ZwOpenSection

ZwMapViewOfSection (can be user caller, see below)

The rundown: This way requires that the ZwOpenSection caller (must be kernel caller), opens the PhysicalMemorysection object in the\Device\object directory. If the handle is opened in the context of a user process, via KeStackAttachProcess, or an inbound IRP servicing a read/write/ioctl, then user code can use this section handle in subsequent calls to ZwMapViewOfSection, or the higher level MapViewOfFile. In this case, the offset argument, represents the physical page. This is undoubtedly the slowest way to perform an entire system sweep, due to the overhead ZwMapViewOfSection incurs. This is the method ESEA currently uses.

2. The second way requires the following functions:

MmGetPhysicalMemoryRanges

MmMapIoSpace

The rundown: This method is probably the easiest and most straightforward of all options. It simply requires feeding MmMapIoSpace with a physical page and we are provided with a virtual mapping to probe. However since MmMapIoSpace is a kernel routine, it of course requires that our scan be carried out entirely in kernel mode.

3. The third way requires the following functions:

MmGetPhysicalMemoryRanges

MmMapMemoryDumpMdl

The rundown: MmMapMemoryDumpMdl, while exported, is not documented by Microsoft. The routine is only documented on 3rd party websites. Just build a MDL describing the physical page and pass it as the one and only argument. The routine will map the MDL for you. This routine has much less overhead than MmMapIoSpace let alone ZwMapViewOfSection, with that being said, it clocks in much faster when doing a full system scan.

4. The fourth method requires the following function:

MmGetPhysicalMemoryRanges

The rundown: This method would be to manipulate the paging structures yourself. Not only would this be unwise, clumsy and idiotic, but in order for the memory manager to not asynchronously bring down the system with a memory corruption bugcheck, you'd have to make sure your work goes uninterrupted by disabling interrupts. Or make your own page tables. Either way you can't be running concurrent tasks on that CPU. Option 4 is certainly a crock pot of disaster unless you know what you are doing.

Let's take note how all 4 options require MmGetPhysicalMemoryRanges. This, in fact, is not optional. I've had a few people ask me questions like "well why can't I just get the amount of physically installed DRAM and scan from 0 to whatever?"

Simply put: Not all addresses put out on the system bus decode to DRAM.

If you have ever done any operating system, legacy bios, or UEFI development, you know all about this.

x86 type architecture follows Von Neumann style architecture. In contrast to something like Harvard, a Von Neumann central processing unit uses the same bus for everything. The stored program is fetched using the same bus, the memory ops carried out be it read or write, are put on the same bus, and the downstream peripherals decode certain ranges and either finish the bus cycle or forward it to a known decode range that can handle the transaction.

Northbridge(functions integrated on cpu today) registers, southbridge registers, and peripheral device registers are all accessed, from the perspective of the CPU, on the same bus. So if we arbitrarily put the address 0xffee0000 out on the bus, not knowing that this is the decode range belonging to some level triggered device that has a read cycle sensitive interrupt indicator, and we send out a read cycle before this device's ISR de-asserts it, the ISR chain will never know the device needed attention. Which would likely lead to a system halt or crash. Another example could be a low latency device that instead of using DMA, just manages an on chip FIFO buffer, so when the CPU is performing PIO on that device, a read cycle instantly makes the new data available on the 2, 4 or 8 byte FIFO buffer.

While the aforementioned cases are probably extreme, they can happen, but a much more likely scenario, would be a machine-check exception. A machine-check is generally not recoverable, and can be brought about by probing memory regions that the firmware said was off-limits.

Ok great, so how does Windows build this table? Answer: firmware. Your legacy bios or your UEFI system which bootstrapped your operating system. Long before the platform firmware even maps your bootloader stub, it detects physically installed DRAM modules, sets up legacy VGA decode ranges, disables transactions to unused ports, sets up ACPI tables, reports usable cores (maybe you disabled HT ;p) and then finally, jumps to your bootloader. Therefore, there is an interface for the OS to query, about what memory ranges belong to DRAM, what ranges are reclaimable, and what ranges are off limits. This information MUST be obeyed. There are no exceptions to this.
Otherwise, the effectiveness of this type of scan is superb. A target cheat module/executable does not even need to be running. It could very well be in file cache, and this means, it will inevitably be in one or more DRAM pages. If it was already run once, Windows is secretly caching a section object for it in case it loads again, in which case it will also be found. Modifies itself on the fly? If it's within the range of a mapped image, those memory writes will just fault in private pages, meaning the original and unmodified pages are still in physical memory.

All in all, this type of scan is an extremely effective means for wreaking havoc on software developed for cheating, and in the end it only boils down to one single function and the data obtained therein. The question is, can you trust said data?

Monday, November 11, 2013

This higher level API is provided to application developers in order to count IO transactions for a process, or a job object (group of processes). Even with such an innocent face, it can easily be used to determine if the process has an active debug port.

The IO_COUNTERS structure, which is filled as a result of the call, tells us operation counts, and byte transfer counts. If you don't already know, it's pretty simple:

When a debugger is attached and it's target calls NtMapViewOfSection (hint, mapping a dll image) for a section object that is an image, it will queue a debug event. Included in this debug event, is a file handle to the image, the debugger thread waiting on the port then calls ObDuplicateObject to provide a file handle as part of it's debug message to the application.

In Peter Ferrie's anti-debug paper, he describes how to deduce that a debugger is attached due to the debugger end not closing it's duplicated handle thereby preventing exclusive access to the file.

This method however is not based off whether the debugger code forgets to close the handle, or uses it (either way preventing exclusive file access) but instead will work regardless, even if the debugger does not use the file handle and closes it upon reception. This is because the initial handle is opened within the context of the target via NtOpenFile (therefore increasing OtherOperationCount by 1), and although closed before NtMapViewOfSection returns, the fact that it incremented means the process has a debug port to dispatch messages to. Otherwise NtOpenFile would never be called, and the other operation count would not increment.

So detection can be as a simple as:

GetProcessIoCounters((HANDLE)-1,pio_counters);

//store other operation count somewhere

MapViewOfFile(); //remember, only builds a debug event if it's an image

Sunday, October 20, 2013

If you haven't already read this,
you probably should. It covers the fundamentals of what will be
discussed here. That way, I can assume you already know what is going on
and I don't have to cover all the miniscule details in this post :)

Back already eh?

Simply
setting the trap flag with an iret/popf variant has always been a
common technique to thwart single-stepping. There are also API's to
offer similar functionality, we wont cover them today because that isn't
really the scope here.

One of the most common is something similar to this:

pushfor word ptr [sp], 0100hpopfxor eax, eaxxor ebx, ebxnop

As
you know, when the boundary of xor eax, eax is reached, we will have
an int 01 trap with a saved IP of whatever follows it. Again as you
should hopefully know, this is common method to trick a debugger that is
already single stepping this sequence into thinking that it caused the
exception and to continue right along. Now any debugger worth its weight
in (bytes? gold? plugins?), or a user who isn't just auto-tracing and
looking manually, should catch this.

There are a few
plugins already for various debuggers that check the trap flag status
prior to popf/iret/syscall/ints and attempt to act accordingly, like
resuming the trace operations at KiUserExceptionDispatcher.

Now lets look at this sequence again, but imagine that BTF is enabled.

pushfor word ptr [sp], 0100hpopfxor eax, eaxxor ebx, ebxnop

Now
lets just assume for a minute that no debugger is attached. Execution
will continue right along after popf/popfd and no trap will be
recognized. This as you know is because even though TF is set, we
haven't hit a taken branch. Thus no trap. We could then modify our
sequence a bit into something like this:

The trap will occur after the boundary of the unconditional jump is reached. The application can then handle accordingly.

Now lets throw OllyDbg into the mix and step through this sequence. You will notice how Olly will single step normally normally over the sequence. Olly will mask off Dr7.BTF after debug event, even if it passes the event back to user code. This means the following situations could easily happen:

-A user or a plugin unaware of this during a trace could mistakenly let the application process a single step exception which followed an instruction that set EFLAGS.TF. The application would see this and act accordingly (like.. explode or something.)

-Ollydbg AND WinDbg both mask off Dr7.BTF when sending an exception back to KiUserExceptionDispatcher. This means that for the duration of the exception chain dispatching, BTF will have no effect.

The application must have wanted this, so pass it back. But since the debugger masked Dr7.BTF, setting the trap flag in your exception handler with popf/iret will cause a trap at following instruction boundary. Otherwise nothing would happen until you either A. reset the flag, or b, hit a taken branch. This is ample evidence that a debugger is involved.

IDA's win32 debugger and Cheat Engine do not have this problem, but don't worry, we have something up our sleeve for them. Also a quick side-note here; a year or so ago, a colleague of mine made some real fun of me for using Cheat Engine as a dynamic analysis and debugging tool. Contrary to whatever he thinks, anyone who does this as a passion loves Cheat Engine. The arsenal just isn't complete without it.

Here is how we can fool them all.

Reminder: LBR data will only be written to the ExceptionInformation structure if the trap flag is set when a #DB exception occurs. In this case we use ICEBP for our #DB. ICEBP for all intents and purposes is a #DB exception.

So if we single step OR branch step over the following magical sequence, it will easily be detectable:

Our first assumption is that the debugger is smart enough to detect ICEBP, whether it be by decoding the instruction stream or checking Dr6, and then passing the exception back to the application. If this isn't happening then the application already wins this round because the exception chain was never dispatched.

If no debugger is tracing this sequence, the ExceptionInformation fields rendered to our application via the EXCEPTION_RECORD structure will contain the linear address of the 'je 02h' instruction, and the second field will contain the linear address of the 'mov ecx, edx' instruction.

If a debugger were single stepping over this sequence, it's implied that it masked Dr7.BTF, and maybe even Dr7.LBR. In either case, even if it only masked one, the ExceptionInformation fields will have a null index, and no data.

Furthermore, if the debugger were branch tracing instead of single stepping over this sequence meaning it left BTF and LBR on, the ExceptionInformation data would contain the linear address of KiDebugTrapOrFault's IRET instruction, followed by the linear address of 'mov ecx, edx. If the debugger for some reason decided to mask off LBR but leave BTF enabled, ExceptionInformation index would be null and the fields would be empty.

In either of the above case, if the debugger didn't preserve LBR or BTF, the improper values would be stored in the ExceptionInformation fields, and we could assume a debugger is attached.

The BTF and LBR Dr7 backdoors exist from XP to Windows 8 in both 32 and 64 bit editions of Windows making this a highly portable anti-debug/trace technique.

Tuesday, October 15, 2013

Finally had some time to look this one over. As you hopefully recall in the previous installment I mentioned how I noticed data fluctuation in the same area of the page for 32 bit builds of Windows 7 (haven't checked 8 for either build yet).

As I guessed it's pretty much the same functionality (garbage stack portion) and can be used to infer /debug. This is the mode where a kernel debugger is not necessarily attached, but can be at anytime. Other indicators such as KdDebuggerEnabled at 0x2D4 or KdDebuggerNotPresent which as you know can be queried with NtQuerySystemInformation will not be of any value.

Anyways in this case, it's close to the same but not entirely. KdInitSystem parses the load options, if /debug is set, we expand our stack further than anticipated for a normal boot phase and land at DbgLoadImageSymbols which uses int 2D (debugger services, like symbols ;p) regardless of whether or not a KD is actually present, if not it's just caught by exception handlers in this case.

Now since we grew the stack quite a bit, and the stack pages were zeroed to begin with, we find ourselves at KiInitializeXStatePolicy. This function writes vendor specific extended processor feature bits into the shared page. It allocates a good 0x450 bytes, which then uncovers the garbage left behind (or is it?) from the DbgLoadImageSymbols interrupt control transfer and exception dispatch.

If the value at 0x4C0 is non-zero, this is enough to indicate. It is highly improbable that the Xsave features will extend that far, but starting at Xsave and searching at a 4 byte boundary for 0xFFFFFD34 would be a more appropriate solution. Similar to the 4 byte 'DBGP' signature for 64 bit builds.

This applies to an original deployed 32 bit copy, all the way to the most recent Windows updates.

Keep in mind this is only for 32 bit builds of Windows 7. The same deal exists in x86/64 targets but is a slightly different story.

Tuesday, July 23, 2013

No no, this isn't the single byte indicator at 0x2D4. Just in case you had maybe thought I lost my mind or something. I did however lose my mind over dictating whether or not they did this on purpose. Read on and post your thoughts.

Lets imagine an operating instance with no outstanding boot flags used to enable the kernel debugger. The data beyond the xsave features area (fpu xstor features etc) may look something like this:

Nothing out of the ordinary eh?

Alright. Lets boot with /debug and com port 1

Wow would you look at all this extra data. Hey I even see a string 'DBGP'! Lets analyze what is really going on here to see if this is on purpose or just simply some kind of accident. After KiSystemStartup passes the loader parameter block to KdInitSystem, KdInitSystem dictates whether or not to initialize the kernel debugger based off of the boot parameters. It is at this point of deciding where our kernel stack is in the current state. You'll have to excuse my art skills though, no fancy crayon drawings today:

data higherthen SP. in use.
↑

RSP

↓data lower then SP. notallocated (garbage)

As KdInitializeDebugger goes through it's layers of execution, needless to say it expands SP as it goes. DBGP is actually an ACPI table in which HAL determines if existing and capable debug ports do exist. For example it ensures that the com port is an actual 16550 UART. This isn't limited to just serial ports, as you know, debugging over USB/network/IEEE is also available. ACPI simply states whether or not these interfaces abide by the Microsoft debugging standard. For instance the USB host controllers must have a debug interface, or it cannot be used for this purpose.

It just so happens that during this process, the table identifier 'DBGP' is saved to the stack prior to asking HAL to look up the table ;p

Thus when KdInitializeDebugger unravels itself, this extra data along with our lovely friend DBGP still exist in the garbage portion of the stack. Ok you are with me so far, that is good, lets continue.

A short time later, KiComputeEnabledFeatures allocates itself a structure to fill for xsave features. It just so happens that this structure overlaps the garbage left behind from KdInitializeDebugger. Otherwise the structure would in fact be zeroed out because it has not been used prior. This structure is then written to the xsave features portion of the kernel/user shared page, and contains this extra information. This extra information is enough to infer presence of a kernel debugger because without /DEBUG KdInitializeDebugger is never called.

This heading is also labeled as (x64). I did look at windows in legacy operating mode but didn't notice the same results however there was some fluctuation, perhaps enough to detect the same flags. When I get more time I will have a look.

Tuesday, July 2, 2013

Been quite awhile since my last entry. Spent some time in Key West, FL and spent some more time moving to the other side of town. I have a some fun things to post about over the next month or so. So stay tuned ;p

When a kernel debugger can attach to the system (KdPitchDebugger == 0) the possibility exists for software (usermode included) to implement an event object type to be set to the signaled state when a time slip occurs. In this context, a time slip occurs because an exception that is passed to the kernel debugger puts all logical processors in a wait state with interrupts masked off.

No external interrupts from timing chips (pit, hpet) can occur. Thus when the logical processor(s) are continued, the machine is living in the past so to speak. Time keeps on slippin slippin slippin...

But..

Prior to exiting the debugger, KdExitDebugger will insert the KdpTimeSlipDpc DPC object into the processor's DPC queue. This DPC will queue a passive level work item routine (KdpTimeSlipWork) which will set a provided event object to the signaled state, if one is provided. User level software can set this field with NtSetSystemInformation with an infoclass of 0x2E. The windows time service
in particular sets this field when it starts up, that is, if the service is running. However it can still be reset. I haven't really looked over the windows time service but my guess is that when and if it is notified of a time slip, that it probably attempts to synchronize the system back over NTP, but who knows.. haven't looked.

We can be sure that if this DPC is fired that a kernel debugger is attached to the system because the only way the initial DPC can be queued is via KdExitDebugger. Control flow cannot reach that point unless an exception occured which was forwarded to the debugger.

The passive level work routine will queue another timer based DPC object with KiSetTimer with a hardcoded duetime of 94B62E00. This value is relative to the system clock at 179999999900 nanoseconds, or every 180 seconds (3 minutes ;p) that it will attempt to set your provided event
object to the signaled state.

Wednesday, March 13, 2013

My goal of this blog is to generally post undocumented details of the Windows operating system. By details I mean topics that would interest both software reverse-engineers and malware analysts alike. One of those topics to me is a lot more prominent then the rest, and that is mechanisms that attempt to detect or evade debugging. Whether it be DRM or actual malware, I'd have to say it's my favorite topic.

What were going to discuss today has probably already been discussed elsewhere, however out of all the methods used to detect if a kernel debugger is attached to the system, I think this one is hardly used or mentioned. Therefore I think it warrants a quick discussion today.

As you probably already know, KeLoaderBlock is the first argument to KiSystemStartup. Among a plethora of other details this structure contains the boot flags from the current BCD entries corresponding our current boot. For instance boot option selection timeout, test-signing, NX opt in or opt out, /debug flags for the kernel debugger etc.

KeLoaderBlock is not accessible from user-mode, but I'm always surprised that many are unaware that during initialization, the startup flags are written to the following registry fields.

HKLM\System\CurrentControlSet\Control - SystemStartOptions

From these flags the software can easily find out if the system was booted with /TESTSIGNING or /DEBUG ON

This method we discussed as you can see is very simple. So simple that it's often overlooked.