Thursday, March 31, 2016

Introduction to Native (aka Undocumented) API

Hi all,

My second post to my new blog and I am even more excited than I was when I posted the first one. That is because this week I was looking at some really "hackish" stuff. Anyway, if you are here then perhaps you already have heard of the so called "undocumented api" or "native api". If you are checking this post as part of my updates sent to forums and sites like linkedin then perhaps you may not know have heard of this so called undocumented API or native API. So take a deep breath... this is going to be a long read...

This post is for beginners who are developers, who have used C, C++ and Windows API already and who wants to dive into the "hackish" dimensions of Windows. Windows gives a mammoth API for programmers who develop with C/C++/Assembly language. Windows also uses this API. However Windows OS also have some "hidden" interfaces that it uses apart from the Windows (public) API. I assume you are aware of this documented API aka Windows API.

So today it is a misnomer when we it as "undocumented API". Most of these native APIs are documented now.. Starting Windows NT days till today, there were some belligerent and talented hackers who did surgery on Windows OS and studied the internal facilities (code) that the operating system uses and also exposes to public especially programmers. Native API had a very advanced documentation in sysinternals website earlier, when it wasn't part of Microsoft and the articles were "visible". There are so many books and MSDN articles written on it. I would primarily thank, for this article, Matt Pietrek, Mark Russinovich, Sven Shreiber, Gary Nebbet, Prasad Dabak, Milind Borate... I may have missed some names.. I learned from many.. so apologies to them who know I follow them but I didn't take their name here... I also have referred exploit-monday.com and rohitab.com for some code. I used exploit-monday to refer the AMAZING documentation of some "undocumented" data structures and functions put by Matt Graeber. I only referred rohitab's forum to check if my code is "really fine".

The code I have put here is definitely different from rohitab's because I use explicit linking. But before I talk about all that... firstly what is this so called "native" or "undocumented" API? Well if you think about all the available application programming interface that Windows provides as layers, then one layer (almost the top) comprises of Windows API that is offered but a huge set of DLL files like KernelBase(kernel32), User32, Gdi32, advapi32, crypt32 etc... Now if that is the first layer or the interface you use as a programmer, then the next layer is "NTDLL" layer (let us put it this way to simplify). To get a clear understanding I would ask the reader to refer to the popular Windows Internals books.. They are called "undocumented" because Microsoft never documented these functions calls and data structures officially (until now where there is some partial documentation). NTDLL.DLL is like a layer by itself and most of the API calls from the other DLLs call functions of NTDLL.DLL. This is one of the major "code path" and by code path I mean flow of code from user mode to kernel mode. NTDLL is like the last layer of user mode and then the code switches mode to kernel mode and your code (actually in form of system calls and requests to system) continue executing. The user mode Windows API is quite well documented. The kernel mode programming interface is provided by ntoskrnl, hal etc and that is documented too. Driver developers will be and should be well aware of those interfaces. Native API sits somewhere in between but isn't well documented. Microsoft uses this layer to do proprietary advancement to "chewing" user mode code and then pushing "massaged" code to kernel mode for execution. You can imagine this is an abstraction offered to the Windows API itself. You call Windows API. Windows API calls NTDLL functions (native api) and then there is mode switch to kernel and so on..
Did that simplify it better than any other documentation about native API? I don't know... if you think so please comment on this article. I can edit and make it better if needed.

Okay, so why do you need to know about "native api"? - You can easily achieve almost all tasks with Windows API. Even higher frameworks offered to programmers like .Net, Java etc all use Windows API. Even the OS uses some of the functions offered by this "Windows API" layer. You would still use native api to know more about what happens inside. You use it because it can give more information and all that perhaps in a single function call. You use it because you skip a layer (WinAPI) and so your code saves several CPU cycles as you skip a lot of instructions that make the Windows API. Hackers and Anti-hackers have been using the undocumented interfaces for a long time. Hooks and Undocumented APIs are favorite for both hackers and anti-hackers. If you are lucky you will find an article about how Symantec anti-virus got into trouble because of using undocumented interfaces (not necessarily native API) after the introduction of Windows Vista. Microsoft warns users from using these APIs because they can change. For most tasks programmers SHOULD use Windows API.

Ok enough of theory, let us put it to practical use... what I will show in this article is a famous code that people usually look for.. "How to list processes using native api" or "NtQuerySystemInformation"?

"NtQuerySystemInformation" is a very powerful function that resides in ntdll.dll. As the name says, it queries a lot of system information. You choose the "information" that you want from an "enumeration". Matt has re-documented (seems to be the latest) in exploit-monday.com. It was indeed documented well, in year 2000 by Gary Nebbet. As said earlier, Microsoft can anytime make any changes to these structures, enumerations and functions. I haven't really done a diff but I could see a few more "information type" added to this enumeration in Matt's.

I assume the reader knows to use LoadLibrary() and do explicit linking. I won't explain all those elementary stuff here. So here is what we need to do, to enumerate the processes currently managed by the operating system... (experts see how careful I was when I said that (lol) - processes don't run + there can be terminated-but-hanging-around processes (zombies) - refer Mark/David/Alex discussion in their internals-book as well as some forums).

You may use winternl.h to get the limited Microsoft documentation (and access) of native api/structs. I prefer to have my own list and I used Matt's documentation said earlier. winternl.h has been used and ntdll library has been implicitly linked, in the code that is shown in rohitab, that is if you want to do it that way... We will do explicit linking here...pretty much what people are mostly looking for...

We are using the undocumented NtQuerySystemInformation function to get some process information stored in the memory that is managed by the operating system. This information is SYSTEM_PROCESS_INFORMATION structure and we have one per process. We do not know how many processes... So allocate a large blob. Thankfully Heap functions do support this amount I gave. If not use VirtualAlloc (rohitab.com). Given one chunk of this information, it has an offset to the next chunk of the same "information set". Use that "link" to traverse all "chunks" until you reach end of blob. We don't check if it is end of blob, instead we see if "NextEntryOffset" goes 0. The rest is basics..