About

After spending most of the past decade without a decent computer, all laptops with GPUs more able at toasting bread than proper gaming, I finally cracked the spare Bitcoin piggy-bank and built my dream machine with an i7-4790k and Nvidia 970 GPU inside it.

I could play Witcher 3 at last, so many great games to catch up on. :-)

But before that I had to get the maximum performance the hardware can provide through overclocking.

The point is, I’m a nerd and nerds like to tweak things. We’re not the kind that puts up with bloated closed source software and crappy xmass tree GUIs. I thus needed a simple and snappy tool to achieve the purpose of overclocking my brand new GPU making it on par with a 980 model.

Let’s be honest here, the main contenders offenders would make the eyes of any sane person bleed instantly:

Those tools are respectively from MSI, EVGA, Gigabyte and Asus.

A quick look at the interface and features suggests they are very similar, probably all built upon the same toolkit, it’s called “RTHAL” in MSI Afterburner.

Anyways this is bad software and their authors should feel bad. Not everyone buying those graphics cards is a 14yo xXX_l33thaxor1ny0ma|\/|4_XXx who wants dragons and giant robots on their packaging.

There is no light and open source overclocking software for power users these days, mostly because GPU makers won’t publish their docs, the situation needs a fix.

Where do we start?

Nvidia has an API to talk to their Driver, at least under Windows, it’s conveniently named NvAPI and it has a documentation here: GPU Performance State Interface.

What would hit the hopeful coder square in the face when reading that is the very reduced set of functions available:

Yep, that also sucks, plenty of functions with “Get” in their names but almost none with “Set”.

Indeed this public API is incomplete, after lurking the interwebs it seems the full featured api, headers, libs and docs are provided under an NDA and there is not the slightest chance I could access that information legitimately I suppose.

I don’t have time to waste jumping through those hoops, creating accounts and whatnot either, I just want to overclock my GPU and the Internet for once doesn’t have anything of that sort readily available since RivaTuner which has never been open source in the first place.

So let’s grab a shovel and go deeper.

What do we know, what do we need?

We sure know that tools from cards makers can do overclocking through NvAPI by accessing the undocumented functions.

Probably if there’s some “public” Get…

NvAPI_GPU_GetPstates20

then there’s a “private” Set… hiding somewhere:

NvAPI_GPU_SetPstates20

Maybe this is more complicated than that for all we know, so let’s start by running MSI Afterburner inside Ollydbg and we’ll quickly land here by browsing the strings references:

“nvapi.dll” definitely gets loaded here using LoadLibrary/GetModuleHandle. We’re on the right track.

Now where exactly is that lib used? There could be thousands occurrences.

That’s simple, with the program running and the realtime graph disabled (it polls NvAPI constantly adding noise to the mass of API calls). we place a memory breakpoint on the .Text memory segment of the NVapi.dll inside MSI Afterburner’s process. (just hit F2 in the segments window when NvAPI is highlighted…).

Then we set the sliders in the MSI tool to get some negligible GPU underclock and hit the “apply” button. It breaks inside NvAPI… magic!

But wait, this isn’t the “overclocking” (SetPstates20()) function there, the symbol for the return pointer on the top of the stack shows something along the lines of “QueryInterface”.

Long story short, this “NvAPI_QueryInterface” function is the only exported function from the nvapi.dll

Its purpose is to take the ID of a function in the API and return a pointer to the actual code of the function in the mapped process. It probably serves as a convenient layer for not breaking the API across updates and also for obfuscating the entry points where the goods are to be found.

Actually if you get the NVapi SDK from Nvidia’s website you’ll find a linkable module inside the archive. It serves exactly no purpose, just acts as an “exports proxy”, it exports the name of all the public functions from the API, when the functions are called it retrieves the real pointer with the ID it holds and executes the real function.

Ultimately the end user/programmer doesn’t have to be aware of all those ID things, he would just call the public functions and link the module from the public SDK using the public headers.

You may already have guessed, I don’t want to proceed that way.

Hopefully if you look again at the previous screenshot of Ollydbg inside the QueryInterface() function you’ll find the sole INT argument to the function on top of the stack just under the return pointer, it’s 0xF4DAE6B. We’re getting closer!

Let’s continue runnning the program in olly and break a second time on NvAPI, we learn from the symbols floating around that MSI Afterburner just initiated a call to “Nv_SetPStates20()”. So 0xF4DAE6B is certainly the ID of the function we’re looking for.

Good we just need its prototype and arguments to be in turn able to declare and use it inside our own code.

Also a quick web search for 0xF4DAE6B yielded this very interesting result where an amazing Russian dude with only 2 messages on his stackoverflow.com profile still found a way to drop this sweet piece of data which looks disturbingly like what the NDA version of the API would be:

In IDApro we can also check the Xrefs from the NvQueryInterface and we land at the start of the data section with a big array of INTs grouped by pairs, each comprises the address of a function and the associated Nvidia function ID:

Then again, IDs and addresses are valid according to the information we already have.

It means we are now sure of the location of the location for “GetPstates20” and “SetPstates20”. We can break directly inside them at will. Let’s do that in IDA after importing the nvapi.h headers so IDA knows about the structs in use: grab the pointer for the second argument on the stack just when entering “GetPstates20”, dereference it and apply the type of an “NV_GPU_PERF_PSTATES20_INFO_V1” struct to it.

Now all those apparently garbage values are starting to make sense.

We can confirm everything is correct as we expected by comparing one of the values to an authoritative measurement. Here 0xD8ACC stands for the GPU vCore represented as µVolts. It is 887500 in base 10, meaning 887.5mV or 0.8875V. The GPU-Z tool reports a similar value.

It seems we’re doing fine:

For good measure let’s go back a bit and put a conditional logging breakpoint in Olly at the beginning of the “QueryInterface” function in order to log ALL the function IDs successively requested by MSI Afterburner. Just in case things don’t go smoothly and we encounter a difficult pipeline setup before being able to overclock the GPU.

Despite the stackoverflow post being somewhat outdated or incomplete we can still name the majority of the functions called.

Reading through that quickly shows the obvious things one would expect, init the NVapi, get various informations, finally call SetPstates20 (when we hit a breakpoint attempting some small underclock) and clean up the API.

Considering “GetVbiosVersionString” is called just before the overclocking related function and it’s a purely GUI/dashboard info feature we can safely assume that no particular setup is required, we just need the correct arguments to call said function.

Reversing the function’s arguments

This one was supposed to be hell but actually it went better than previous thought:

– Most NvAPI functions including GetPstates20 take a physical GPU handle as their first argument.

– GetPstates20’s second argument is a struct for storing Pstates and it is documented in the public NvAPI headers.

– A quick look at the code calling “SetPstates20” in MSI afterburner shows 2 pushed arguments before the call. The first one is “0x100” for both the GET and SET functions, it is the handle for our GPU#0.

It is then highly likely that “SetPstates20” will take the same kind of struct as “GetPstates20” for its second argument, with a few edited values.

Let’s get coding

First, let’s isolate the data structures we need from the NvAPI headers because there are far too many lines in that file and I’m lazy to the point that scrolling hurts my finger. Also those are mostly ints so we’ll get rid of all the fancy names and make them regular uint/int for readability.

Then we need prototypes for the functions we’ll use. Remember we won’t call the provided exports from the NvAPI lib inside the SDK but rather retrieve the function pointers directly from the running nvapi.dll and execute them as such.

A handful of convenient function prototypes to get some infos, retrieve clocks and setting them up. Some of them can be found inside the public API, the others are probably from the NDA version. We use the same techniques as mentionned earlier to get to know about them:

Time for the main() function, the code should be fairly short and this is just a PoC or whatever, brace yourself for screaming KNF nazis.

We’ll need those variables, they are of lesser importance, just the last line is a requirement.

“NV_GPU_PERF_PSTATES20_INFO_V1” is the root struct holding all the clocking and power data for the selected gpu handle. The size of this struct is 0x1c94, for some reason Nvidia decided to use that as the “version” field after adding 0x10000 to it so we set that field to 0x11c94 or the subsequent calls using the structure will return a cryptic error code.

Now we actually load “nvapi.dll” in our program’s memory space and retrieve the “nvapi_QueryInterface” export that will provide us with the pointers for all the other functions. We then call it sucessively with all the IDs we need and assign the result to our function pointers.

What it does is simple. Frequencies in the struct are expressed as KiloHertz, so we multiply by 1000 the frequency offset provided by the user trying not to ease an integer overflow that may induce a frying core. ;-)

No seriously, we “allocate” some “NV_GPU_PERF_PSTATES20_INFO_V1” yet again, but it’s a bloated pain and we only want a few values changed so we make it an empty buffer the same size of the struct.

We then fill the first int with the magic 0x11c94 version number. The 2nd and 3rd ints with 1, probably meaning we’ll provide only one Pstate profile containing only one Clock domain to the SetPstates20() function.

But if we do that… it works for the GPU but not for the VRAM, how do we overclock the damn VRAM?

At this point I was saying to myself “why can those guys get an overclock in their crappy soft and I cannot, that’s just unfair”. But this approach never gives any noteworthy result so I got back in IDA and diffed my struct with the struct that MSI Afterburner provides to the same call when overclocking the RAM.

And the trick was there before my eyes, the 7th int of the struct had changed from 0 to 4. This field is probably used as a flag with 0 being the GPU, 2 may or may not be the separate shaders clock domain for the previous GPU generations of that kind and 4 would then be the VRAM.

At last we cleanup the DLL and end our program:

NvUnload();
return 0;

What’s left to be done?

Testing our new toy obviously! We compile that thing and run it:

C:\>overclock.exe [+/- GPU MHz offset] [+/- RAM MHz offset]

Here we run two benchmarks using a basic MD5 bruteforcing so that we can be sure the modified clock speed is effective and we didn’t just change some funny numbers for display only.

First at stock frequency (950MHz) and then with a 100MHz underclocking (the dev machine is a laptop, I don’t intend to make it faster so substracting 100 will suffice).

And… it works! We’re done here guys.

I am not aware of any other open source implementation of such tool, it might only be a very simple C program in the end but it exists and the minimum required details for overclocking an Nvidia GPU programmatically are now public and in plain text.

This code, for what it’s worth, is free as in free beer: take it, polish it, make a LIGHT (not the usual stellar poop, we already have those) GUI for your own needs and enjoy.

Here’s a binary build of the program, rename it as .exe as wordpress.com will only let users upload media files: