12.7. Memory Scanning in Kernel Mode

Memory scanning in kernel mode is very similar to user mode implementation in its basic functionality. It will always be safer to perform memory scanning in kernel mode. Furthermore, a kernel-mode memory scanner can scan the upper 2GB of kernel address space for viruses. Currently only a few viruses have kernel-mode components on NT-based systems, but it is very likely that more such viruses will be developed in the future as file system filter drivers. This section explains the major problems in developing a kernel-mode memory scanner for current Win32 viruses running in user mode. I will introduce the basic procedures that are important in scanning the upper 2GB of address space for kernel-mode viruses.

12.7.1. Scanning the User Address Space of Processes

In kernel mode, the user address space scanning of each process can be done similarly to user-mode memory scanning. In fact, many system functions can be used by adapting them in kernel mode. There are several ways to get the process IDs of each running application. One possibility is to use the NtQuerySystemInformation() API, which is exported from NTOSKRNL.EXE by name and therefore is as easily callable as ZwQuerySystemInformation() (ZwQSI) from a kernel-mode driver. Of course, the function is undocumented, so the necessary declarations must be specified and included first; otherwise, the linker cannot link the driver correctly.

12.7.2. Determining NT Service API Entry Points

Unfortunately, some of the important APIs needed for memory scanning are not exported by name from the kernel (NTOSKRNL.EXE) for the use of a kernel-mode driver. When a user-mode application calls the VirtualQueryEx() API in KERNEL32.DLL, the call is redirected to the NtQueryVirtualMemory() API in NTDLL.DLL.

Surprisingly, this API is not available from the kernel (NTOSKRNL.EXE). The function is there for the use of the NTOS, but it is not exported for other drivers. Evidently, NT's designers did not consider situations when "messing" with the Virtual Manager's operations is necessary.

A driver can solve this problem in two different ways. It can be linked against NTDLL.DLL, which is the easiest way. The other possibility is to develop a function similar to the user-mode GetProcAddress()with some important differencesthat can get the function ID of a particular NT service by traversing the export table of the NTDLL.DLL in the system context. Such a function can pick up the NT service function ID, which is placed into the EAX register with a MOV instruction at the entry point on IA32 systems. This way the driver can specify the correct address of the function inside the Windows NT executive (NTOSKRNL.EXE ) as KeServiceDescriptorTable+NtServiceID.

Listing 12.13 is an example of an INT 2E function call in NTDLL.DLL, NtCreateFile().

Listing 12.13. A Sample Service Call on NT on IA32

Windows XP implements similar stubs in NTDLL.DLL in IA32, but the code uses dynamically created "trampolines." The syscall sequence will not use an INT 2E if the processor supports the sysenter instruction. In such a case, the NTDLL functions instead call into one of the last pages of the user-mode process to execute code. The content of this page is previously generated on the fly according to the features of the processor, as shown in Listing 12.14. Indeed, this page is not part of any DLLs on the system.

Intel implemented a new instruction called sysenter in Pentium II processors. It is a faster way to switch to kernel mode, so XP saves a few CPU clocks in millions of API calls, making the system faster.

Listing 12.14. A Sample Service Call on Pentium II Processors

Note that the ID still remains available at the native API entry points (27h in this example).

12.7.3. Important NT Functions for Kernel-Mode Memory Scanning

Several functions are very useful for scanning the memory of processes.

NtQueryVirtualMemory() queries the pages of a particular process. This function is not documented, but it is only a translation of the VirtualQueryEx() API to ZwQueryVirtualMemory(), which is placed in the kernel (NTOSKRNL.EXE). Its name is shown by the Windows NT kernel debugger because the debug information contains the name of the function. This function (like several others), however, is not exported by name from the kernel (NTOSKRNL.EXE).

Other useful functions are NtTerminateProcess(), NtOpenThread(), NtSuspendThread(), NtResumeThread(), and NtProtectVirtualMemory(). Most of these functions are translations of their user-mode equivalents but remain undocumented. The header declarations must be done one by one for each of these functions. Furthermore, ZwOpenProcess() can be used to gain a handle to the processes.

12.7.4. Process Context

In NT, kernel-mode drivers run in three different classes of context4:

System process context

Specific thread (and process) context

Arbitrary thread (and process) context

Depending on the circumstances, the lower 2GB of virtual memory maps any user process or no user process at all. The memory scanner should be able to switch to the context of a particular process to map the process to the lower 2GB of the virtual memory. One way to do this is to use the undocumented KeAttachProccess(). The necessary header declaration of this API is

VOID KeAttachProcess(
IN PEPROCESS Process
);

This kernel API first needs a PEPROCESS parameter (a pointer to an EPROCESS structure). This can be converted by another undocumented API called PsLookupProccessByProccessId() by passing a normal process ID as the first parameter13:

Whenever the kernel-mode memory scanner needs to read a page, it should switch the context to the particular process it wants to access. KeDetachProcess() returns from any context to the system context:

VOID KeDetachProcess(
VOID
);

The query function must be carefully developed to work correctly in all problematic circumstances. Because the process pages can be queried as previously described, unavailable pages should not be accessed. Otherwise, the memory scanning would be terribly slow with far too many exceptions slowing down the system.

An alternative is simply to use the ZwOpenProcess() function to get a handle to each process to be scanned.

12.7.5. Scanning the Upper 2GB of Address Space

The upper 2GB of the address space contains executable code, such as the NT executive, system drivers, and third-party drivers. The list of drivers can be queried using Object Manager functions. Alternatively, NtQuerySystemInformation() can be used with the information query class 11 (0x0B), which returns the list of loaded drivers with their base addresses.

It is not very easy to query the pages of that area because there are no API interfaces to do so. It would be feasible to query the page tables, but that leads to service packdependent coding and further stability concerns. The easiest solution is to check the base address of each driver and parse their structures directly in memory. Because any driver has complete access to the upper 2GB of address space, this is possible and can be done easily by parsing the section header table of each driver in memory. In principle, this is what SoftIce Debugger does to show the loaded drivers list.

Scanning the paged and nonpaged pool area is not trivial, either. The easiest solution is to find a reference to the virus code, such as a hook routine on a handler that points to the virus code from a fixed location.

12.7.6. How Can You Deactivate a Filter Driver Virus?

Such a question might sound strange because no existing virus is known to use this approach. But the method is definitely possible, and we can be sure that such a virus will be developed. (In this section, I assume that the reader has basic knowledge of Windows NT drivers.)

The problem is that filter drivers cannot be unloadedat least this is the suggestion of Microsoft, so it should be considered a very strong opinion. File system filter drivers are attached to the device object of a particular file system driver (ntfs.sys, fastfat.sys, and so on), or they are attached to another filter driver's device object, building up a chain of filter drivers. In fact, a particular filter can be attached to many device objects of other drivers. (Figure 12.8 shows an example.)

A filter driver can be easily detached from the end of the list, but it is not safe to do so. An additional problem is that a filter driver between two other filter drivers, or between a file system driver and a filter driver, cannot be detached because this would simultaneously detach all drivers after itself on the chain. Therefore, it was necessary to find another solution. After several attempts, I found an approach that works.

The execution of a driver begins in its DriverEntry function. Within this function, filter drivers typically create a new device object (a hook device) and then attach it to the device object of the device to be filtered by calling the IoAttachDevice(), IoAttachDeviceToDeviceStack(), or AttachDeviceByPointer() functions.

File system filter drivers must support fast I/O so that they implement a FAST_IO_DISPATCH table with function pointers to their own fast I/O entry points. After performing the fast I/O filtering in a particular fast I/O hook routine, the filter driver must call the original fast I/O entry point of the driver to which the filter driver's hook device was attached. Interestingly, Windows NT itself does not save the pointer to the lower device object. Each driver must save these pointers, and it is recommended to keep this pointer in the DeviceExtension of the hook device. The DeviceExtension, however, is an absolutely driver-specific structure, and each driver can define it to its own preferred formator not use it at all. All this makes our task more difficult.

It seems the only way to safely "deactivate" a filter driver is to "filter it" in a nonstandard way that does not let the driver receive control in any of its filtering routines. Instead, the driver to which the particular filter driver was attached must be called. To do this, the refiltering driver (DeactivatorDriver) must patch the filter driver's driver object (VirusDriver). All MajorFunction[] entries of the VirusDriver should instead point to the HookDispatch routine of the DeactivatorDriver. Additionally, the FastIoDispatch field of the VirusDriver should point to the fast I/O table of the DeactivatorDriver.

When this patch is performed correctly, the fast I/O entries of the DeactivatorDriver will get control instead of the VirusDriver's own. The major problem is that each fast I/O routine of the DeactivatorDriver should call the fast I/O routine under the VirusDriver by traversing the device object chain of the VirusDriver. The AttachedDevice field of all file system drivers' device objects must be checked to see whether a VirusDriver's hook device is attached to them. When the AttachDevice field of a file system driver's device object is equal to any of the VirusDriver's hook device object pointers, the device object pointer of the file system driver should be saved. Whenever the DeactivatorDriver's fast I/O is called, the fast I/O can be redirected to the driver to which the VirusDriver was attached. This is because the saved device object pointer will point to a device object that has a pointer to the owner's driver object. If that driver object has a fast I/O entry point for the fast I/O that has been filtered by the VirusDriver's fast I/O routine, it should be called by passing the incoming parameters to it without any modification. From then on, the fast I/O of the VirusDriver will be refiltered and deactivated.

In a similar manner, the Dispatch routine of the DeactivatorDriver must complete the Interrupt Request Packets (IRPs) of the VirusDriver or pass the IRPs to the corresponding device object with the IoCallDriver() routine.

Complicated? No doubt about it! Certainly this could be done more easily if the NT-based systems filter driver model were organized slightly better.

12.7.7. Dealing with Read-Only Kernel Memory

Windows 2000 implemented read-only kernel memory. If read-only memory is on, non-writeable pages, such as code sections of drivers, cannot be changed. This is to protect the OS kernel (and its data) and drivers from each other. However, this feature also helps computer viruses, requiring extremely careful removal.

It turns out that this feature is only active if the system has 128MB or less physical memory. In this case, the virtual memory is managed with 4KB pages, but if more memory is available, the system switches to large page mode. So far, the protection is not available in that mode.

Nevertheless, there are a couple of ways to deal with read-only memory. For example, the WP flag of the CR0 control register of the IA32 processor could be flipped during writes. This can be done in kernel mode but must be performed with special care (it is definitely a hack!). When WP is off, all pages can be written into.

12.7.8. Kernel-Mode Memory Scanning on 64-Bit Platforms

Most of the 32-bit Windows viruses can already infect 64-bit Windows systems. This is because 64-bit Windows supports 32-bit executables by default. However, 64-bit viruses have already begun to appear. It is expected that virus writers will create a lot more viruses on AMD64 and EM64T (the IA32 with 64-bit extension) systems because programming on those systems is simpler, and such systems are relatively cheap, so attackers will more likely gain access to them. Somewhat contradicting, the first 64-bit viruses appeared on the Itanium processor14.

The 32-bit processes are linked against 32-bit DLLs only and implemented as a WOW (Windows-on-Windows) system. NTDLL.DLL is 32-bit in the 32-bit process but eventually switches to a 64-bit kernel (NTOSKRNL.EXE).

In the system process, NTDLL.DLL is 64-bit. Porting the 32-bit memory scanner to 64-bit is straightforward. You can decode the entry points of the 64-bit NTDLL.DLL exports to choose the ID that is equivalent in function to the EAX value on IA32. This is what you need to decode to get an NtServiceID for memory scanning if you want to follow the 32-bit approach described in this chapter. Listing 12.15 is a 64-bit Windows syscall on the Itanium.

Listing 12.15. A System Service Call on IA64

This code can be confusing to someone unfamiliar with the Itanium processor. The actual NtServiceID is moved to the r8 register (it is 6 in this example). The long 64-bit value is moved to the r2 register. After that, you have a do-nothing operation.

This is not junk, though. The Itanium processor encodes instructions into a bundle. There can be up to three slots, three instructions in one bundle. Therefore the compiler needs to fill the space in the slot with NOPs if the next instruction cannot be encoded there. The code execution goes from bundle to bundle via IP, the instruction pointer. The instruction slots are decoded according to a mask.

Finally, the code branches to b6 (branch register), which has the value of the r2 register to complete the service call. To decode the NtServiceID, someone must decode the mov r8=6 operation that is encoded into the same bundle as the following MOVL and NOP opertations. This is the easy part.

After you have the NtServiceID, you need to understand how the GP (global pointer) register works on the Itanium. The GP is a preassigned value for accessing data within a load module. There is no global pointer on x86 architecture. It was already used, however, on RISC machines, and NT defined it long ago for the Power PC.

When a standard call is made, GP must be set by the caller. The GP value is available in the load module's header via IMAGE_DIRECTORY_ENTRY_GLOBALPTR.

To call an NTAPI function, you need to get the GP of the kernel (such as NTOSKRNL.EXE). That is a simple task because you can use ZwQuerySystemInformation() to get the base of the module easily.

You also need to know how to define a function pointer. On IA64, each API and function is defined as PLABEL_DESCRIPTOR-s (PLD)15:

Thus the API you need to call dynamically must be defined as a PLD. Before making a call to the function, you need to set the GP to the kernel's (NTOSKRNL's) GP and set the EntryPoint to the corresponding address in the service descriptor table entry, which you can get with the decoded ID from NTDLL.DLL. In this way, calls to nonexported APIs become a trivial task.

Note

The AMD64 and EM64T processors do not use a GP register.

Scanning the driver spaces can be solved in a way similar to IA32 systems. See Listing 12.16 for a map snippet of the 64-bit NTOS and loaded drivers on IA64. The System32 folder is a remnant directory name that stores the 64-bit NTOS image. NTDLL.DLL remains to be loaded at the "bottom"of the user address space.