PInvoke pointer safety: Replacing IntPtr with unsafe struct pointers

Introduction

When using .NET Platform Invoke tools to call native C functions in DLLs, it's common for paramaters to contain pointers to structures, classes, or strings. Sometimes it's practical to marshall the data into a managed representation, such as when marshalling a C char* string into a .NET String class.

However, other times the complexity of the native data can be left entirely in the native code. We call these pointers opaque pointers, because from managed code we will never understand the data they point to. Instead our is our responsibility to receive, store, and produce the opaque native pointer when necessary in the API.

The .NET CTS provides a value type called IntPtr, which can be used to store opaque native pointers. However, it has a serious drawback, in that with respect to the type-system, all IntPtrs are the same type. If your native library has several types of pointers passed across the PInvoke boundary, the compiler can't help you make sure you provided the right IntPtr in the right situation. Providing the wrong pointer to a native C-DLL entry point at best will have eronous results, and at worst will crash your program.

A safer alternative to IntPtr is the unsafe struct *. Just like IntPtr, unsafe-struct-pointers are a value type used to store opaque pointer data that won't be accessed from managed code. However, because the structs themselves have types, the compiler can typecheck and assure the proper pointer type is supplied to the proper native entry point.

Note: Another alternative to IntPtr is the SafeHandle pattern introduced in .NET v2.0 "Whidbey". While this article will focus on the raw use of unsafe struct pointers, we'll also compare this approach to SafeHandle.

Pointer Marshalling

Let's begin by looking at a basic pointer marshalling situation. We will use the .NET wrapper around the Clearsilver HTML templating library as an example. Two functions in the C Clearsilver DLL are:

These function calls contain two different pointer types, NEOERR and HDF. Both are pointers to native structures. When using the Clearsilver library from C, those structures are encapsulated opaque data, manipulated only through functions. Our .NET code will treat them the same way.

One could use PInvoke to provide access to these functions using IntPtr. Our C# .NET imports might look like:

The compiler now knows the specific types of the HDF and NEOERR pointers and can tell them apart. A C# managed class can then control access to these unsafe pointers, while still receiving compiler typechecking that the proper pointers are provided at the proper entry points.

An often raised objection to this method is that the code now requires the unsafe keyword. However, keep in mind that there is nothing "safe" about IntPtr and [DllImport]. Either one of which can easily crash the runtime. In fact, if it was up to me, [DLLImport] would be treated as unsafe, IntPtr wouldn't exist, and unsafe struct pointers would be the advocated mechanism to handle all native pointers.

Below we've expanded the example to include a constructor and a "safe" setValue() method.

Take special note of the use of the internal Access Modifier on the private unsafe instance pointer HDF *hdf_root. This assures that only our managed wrapper assembly has permission to access this pointer. Also take note that we're not yet addressing memory lifetime issues, which we'll briefly cover in the next section.

Memory Lifetime

The main intent of this article is to cover how to replace use of the generic undifferentiated IntPtr with more specifically typed unsafe struct pointers. However, we're going to briefly look at some of the issues related to memory lifetime for these pointers.

If the code above was used as-is, when the garbage collector released an Hdf instance, the memory pointed to by hdf_root would leak. Further, each call to hdf_init or hdf_set_value potentially leaks a NEOERR structure which should be freed if it exists.

In the Clearsilver C# wrapper we handle these cases using a combination of finalization and pointer wrappers. The Hdf class is given a destructor which will free the native pointer sometime after the object is garbage collected. To simplify this process for the commonly accessed NEOERR type, a separate NeoErr class is created. This class has a static method hNE to handle the NEOERR return value case of "zero if success, object which much be freed if error". That static method looks at the return value, and throws an exception if the return value is non-zero.

Another issue with the above code is that we would like HDF strings to be UTF-8, not ASCII. We'll solve this by using a custom string marashaller which uses UTF-8 instead of ASCII/ANSI strings. You can read more about this technique in my article on Advanced Topics in PInvoke String Marshaling.

To see the details of the NeoErr class, the full Hdf and Cs wrappers, or more examples of how to use unsafe structs for safer PInvoke entrypoints, check out the full Clearsilver C# wrapper, and the clearsilver source kit available from clearsilver.net.

Points of Interest

The above unsafe struct pointer usage is valid according to the Common Type System spec, and works in Microsoft.NET. However, several versions of the Mono .NET runtime marshalling code did not properly handle unsafe struct pointers in PInvoke entry points. Therefore, you'll need to be using the very latest Mono 2.12 (or very very old versions of Mono) for this to work.

Another model for marshalling pointers without the type danger of IntPtr is to use SafeHandle. Like unsafe struct pointers, SafeHandles replace IntPtr in [DllImport] entrypoints, allowing strongly typed native pointer handling. unsafe struct provides a familiar coding idiom for C/C++ programmers, allowing the use of try/finally, finalizers, and situations such as the double-indirect pointers required in the above hdf_init(HDF **hdf) call. SafeHandle, on the other hand, provides a solution to a tricky GC finalizer race condition which Platform Invoke code can fall victim to.

Share

About the Author

David Jeske is an Entrepreneur and Computer Programmer, currently living in San Francisco, California.

He earned his B.S. Computer Engineering at University of Illnois at Champaign/Urbana (UIUC), and has worked at several Silicon Valley companies, including Google, Yahoo, eGroups.com, 3dfx, and Akklaim Entertainment. He has managed and architected extremely high-traffic websites, including Yahoo Groups and orkut.com, and has experience in a broad spectrum of technology areas including scalability, databases, drivers, system software, and 3d graphics.

You can contact him at davidj -a-t- gmail (dot) com for personal messages about this article.

Comments and Discussions

5 for me, and Jonathan had a point. You can create and use typed SafeHandle instead of raw pointer. The advantage is to be sure to release memory or handle while keeping strongly typed code. I used IntPtr in the past, I vote 5 because I did not thought about using raw pointer, and even if safehandle are even better, raw pointer is a step forward from IntPtr.

Thanks for the pointer to SafeHandle. That is certainly a nice improvement to IntPtr. I've only just been introduced to the mechanism. From a quick study, it appears one advantage of SafeHandle is that it solves a tricky race condition in GC finalization and DllImport entry points...

I'll have to do some more investigation to understand how to use SafeHandle to marshal double-indirect pointers, such as the HDF ** pointer example in the article. It should be possible as this example seems to do it..

Double indirection? Let's assume that HDF (please expand the acronym in C# ) is a SafeHandle. YMMV (I have an immense amount of faith in the .Net team), but I assume HDF* would give you your double-indirection (IntPtr = void*, IntPtr* = void** - SafeHandle = IntPtr). You are still using a pointer, but it's a pointer to a managed type - which should be a little safer. Depending on whether it works. You could attempt an explicit conversion operator to HDF* in the event it doesn't.

Finally, if all else fails, it's more "C#'fy" to wrap the IntPtr in a class and use that as your strong reference. Try and logically 'map' those structs or pointer types to C# analogues. Here is an example (ignore the Rapture3D part, just look at how I wrapped OpenAL): https://github.com/jcdickinson/Rapture3DTester/tree/master/OpenALTest[^] (keep in mind it was quick'n'dirty, so I don't use SafeHandle myself - I should be though).

Also! Remember C# is happy to convert a void*[] to a void** (thus an IntPtr[] to and IntPtr* thus to a void**).

Best of luck, I'll promote my vote to 5 because of your willingness to learn .

He who asks a question is a fool for five minutes. He who does not ask a question remains a fool forever. [Chineese Proverb]

We all always have something more to learn... Thanks for posting and discussing your experiences!

I'm confused by your second paragraph. I absolutly do "wrap my native C-code pointers in a C# managed class and use that managed class as my strong reference". That's what the HDF and CS managed classes are... managed lifetime wrappers around the native pointers. Just like your OpenAL wrapper, I don't expose native pointers out of my managed class.

Internally, however, I type the pointers as "unsafe struct *" instead of IntPtr, so when there are a bunch of types of pointers, I get type-checking to differentiate them. The unsafe-struct-pointers are internal (aka private to my wrapper), so nobody can see them.

In your OpenAL wrapper, you were forced to use IntPtr because that's how Tao.OpenAL exposed it, but you can see in your ALDevice.cs already that you have two different pointer types next to eachother that are both typed as IntPtr (_device and _context).

Further, because of Tao's use and exposure of IntPtr, I can just call Alc.alcCreateContext with any random IntPtr and the compiler won't complain or help me find the bug. I think I can even just do something bogus like Alc.alcCreateContext(new IntPtr(666),...), which is sure to do bad things.

If I had written Tao.OpenAL, and had a good reason for exposing native pointers, the definition would have been something like...

This way, the compiler static checking could assure you properly passed either a Context or Device as required by the native calls.

---

Going back to the double-indirect comment... I'm not talking about indirecting the managed SafeHandle, I'm talking about how to use SafeHandle in the case of marshalling the native C function paramaters as either HDF* or HDF**. For example, consider the following two C function prototypes...

struct Foo { };
void funcA(Foo **inouta);
void funcB(Foo *ina);

While I have not tested the following.... Searching around, it looks like these would be safehandle marshalled as...

(as for naming them HDF and CS instead of HierarchialDataFormat and ClearSilver, I did this because for our use it's more convenient to have the ClearSilver API nearly identical in Python/Ruby/Java/C# than it is to try to match naming conventions in each language. These details are far far outside the scope of the article.)

I understand the reason why SafeHandle was introduced. If the native C/C++ object only consumes memory resource, an IntPtr might be just enough because using SafeHandle slows the P/Invoke. Just my own experience.