Wednesday, May 20, 2015

Today I'll be investigating an issue involving Bitdefender, which is turned out to be a Windows bug/issue more than Bitdefender, although there are developmental changes that could be made aside from a hotfix to stop this issue. Bitdefender's 0x4A bug check issue has been prevalent for quite awhile now, but there's little to no documentation on solving it or what's causing it, just a few things to try like updating Bitdefender, uninstalling it, etc

Throughout all of the 0x4A Bitdefender related crashes, the NT kernel was labeled as the fault:

Probably caused by : ntkrnlmp.exe

Given we're seeing ntdll, we can likely imagine the reason for the NT
kernel being blamed as being the fault of the crash is because most of
the API from ntdll is implemented in the NT kernel variants, with this being ntkrnlmp.exe because this system has a multi-processor without physical address extension configuration.

All we can see if we're exiting user-mode code using the KiSystemServiceExit function, and we go off the rails right there - KiSystemServiceExit+0x245. This function is in charge of handling the various call-styles used to enter kernel-mode, and then returning to user-mode.

With that said, let's switch to the other processor within the system that was involved and see what's going on at the time of the crash. To find out the active processors on the specific system, we'll use !running:

I used knL as opposed to the other stack dump commands as I wanted to get the frame # feature for reference reasons.

Starting at frame # 2a, we can see the NtDeviceIoControlFile function calls IopXxxControlFile. The latter function appears to be undocumented, so I'm unsure as to what it does. What I do know is, the NtDeviceIoControlFile function is ultimately used to build descriptors for a driver. I imagine it's using the IopXxxControlFile function to aid in passing such to the driver.

Also, for what it's worth, although NtDeviceIoControlFile has since been superseded by DeviceIoControl, the former native function provides more information that may be beneficial to the caller (especially for debugging purposes). This is likely why Bitdefender chose to use the former function instead.

So after neatly putting together this disassembly of sorts, we can see that this is indeed how the NtDeviceIoControlFile function is passing on the buffer and such to the driver.

The IoAllocateMdl function in this specific case is used to ultimately associate the MDL with an IRP, which is why we call into the IoAllocateIrp function, to of course assign the IRP. IoGetAttachedDevice is called likely to return a pointer to the devobj, with help from the IoGetRelatedDeviceObject function to probably obtain the devobj from the file system driver stack.

ObReferenceObjectByHandleWithTag is called to increment the reference count of the object, and to write a four-byte value known as a "tag" so it can support object reference tracing for debugging purposes. Finally, the ProbeForWrite function is called to ensure that a user-mode buffer meets the following:

Resides in the user-mode portion of the address space.

Is writeable.

Is correctly aligned.

As all appears to have went well, we can see the driver we were ultimately building and passing descriptors to/for was bdfwfpf.sys, which is Bitdefender's firewall filter driver. As it's a driver in charge of a firewall, it of course uses the WFP API (Windows Filtering Platform) to achieve its goals (not just filtering and monitoring).

We can confirm this easily by looking at the very first driver/function call after Bitdefender's firewall, which is fwpkclnt.sys. Specifically, Bitdefender's firewall driver called it to inject new/cloned data to the data stream. Directly afterwords we have calls from the Network I/O Subsystem to continue the injecting, which is because fwpkclnt.sys exports kernel-mode functions, as opposed to fwpuclnt.dll which exports and handles the user-mode side.

To handle and/or continue the injection into the data stream, it looks like DPC(s) are used to handle it by calling KeInsertQueueDpc to create a queued DPC for execution.

-- After discussion with a fellow kernel-debugger friend of mine, we also
thought that the IRQL was possibly DISPATCH_LEVEL due to the multiple
injections, etc, therefore Windows deferred it to a DPC. Given this
possibly being the case, when the DPC was to be worked on, the system
service finished but the IRQL is still DISPATCH_LEVEL. Since that was the case, we get a bug check.

We continue through netio.sys' functions regarding the data stream injection, ultimately injecting the request to the stack and going through a few tcpip.sys functions.

To continue sending the data along, NDIS' NdisSendNetBufferLists function is called, and NDIS' filter driver (which I believe is pacer.sys), called NdisFSendNetBufferLists to send the list of network data buffers back to Bitdefender's firewall driver.

In order to do so, NDIS needs to call the HAL, which we can see through the function HalBuildScatterGatherList. What is supposed to happen next is, the HAL builds the scatter/gather list, and we go on through various registered miniport functions. However, this did not happen, and we go off the rails on frame #00 with a call to the miniport driver.

We get a lot of good information, and can see that Bitdefender's firewall filter driver is/was involved with this miniport. We know this, because we saw it all happening in the stack, but this just confirms it.

What appears to be happening here is multiple NBLs in a chain are being passed, the FwpsStreamInjectAsync0 function is called to pass Bitdefender's data, and then the chain is broken as the call goes on (see the NBL next member is zeroed out/null).

Possibly a fix (in Bitdefender's case) is to avoid multiple injections inside the stream callout routine, possibly taking NBLs in a chain and calling the FwpsStreamInjectAsync0 function just ONCE for each callout routine execution. Unsure, kernel development isn't my strong point : )

A fix for user's is to install this hotfix and hope it works, as it should. Overall, maybe Bitdefender instead of making any developmental changes could just raise awareness for this issue, like creating a well explained documentation page with a link to the hotfix. I think developmental changes would be a better workaround.