C++ - Using STL Strings at Win32 API Boundaries

The Win32 API exposes several features using a pure-C interface. This means there are no C++ string classes available natively to exchange text at the Win32 API boundaries. Instead, raw C-style character pointers are used. For example, the Win32 SetWindowText function has the following prototype (from the associated MSDN documentation at bit.ly/1Fkb5lw):

The string parameter is expressed in the form of LPCTSTR, which is equivalent to const TCHAR*. In Unicode builds (which have been the default since Visual Studio 2005 and should be used in modern Windows C++ applications), the TCHAR typedef corresponds to wchar_t, so the prototype of SetWindowText reads as:

So, basically, the input string is passed as a constant (that is, read-only) wchar_t character pointer, with the assumption that the string pointed to is NUL-terminated, in a classical, pure-C style. This is a typical pattern for input string parameters passed at the Win32 API boundaries.

On the other side, output strings at the Win32 API boundaries are typically represented using a couple of pieces of information: a pointer to a destination buffer, allocated by the caller, and a size parameter, representing the total size of the caller-provided buffer. An example is the GetWindowText function (bit.ly/1bAMkpA):

In this case, the information related to the destination string buffer (the “output” string parameter) is stored in the last two parameters: lpString and nMaxCount. The former is a pointer to the destination string buffer (represented using the LPTSTR Win32 typedef, which translates to TCHAR*, or wchar_t* in Unicode builds). The latter, nMaxCount, represents the total size of the destination string buffer, in wchar_ts; note that this value includes the terminating NUL character (don’t forget that those C-style strings are NUL-terminated character arrays).

Of course, using C++ instead of pure C is an extremely productive option for developing user-mode Windows code, and Windows applications in particular. In fact, in general, the use of C++ raises the semantic level of the code and increases programmer productivity, without a negative impact on application performance. In particular, using convenient C++ string classes is much better (easier, more productive, less bug-prone) than dealing with raw, C-like, NUL-terminated character arrays.

So the question now becomes: What kind of C++ string classes can be used to interact with the Win32 API layer, which natively exposes a pure-C interface?

Active Template Library (ATL)/Microsoft Foundation Class (MFC) Library CString Well, the ATL/MFC CString class is an option. CString is very well integrated with C++ Windows frameworks such as ATL, MFC and Windows Template Library (WTL), which simplify Win32 programming using C++. So it makes sense to use CString to represent strings at the Win32 API platform-specific layer of C++ Windows applications if you use those frameworks. Moreover, CString offers convenient Windows platform-specific features, like being able to load strings from resources, and so forth; those are platform-dependent features that a cross-platform standard library like the Standard Template Library (STL) simply can’t offer, by definition. So, for example, if you need to design and implement a new C++ class, derived from some existing ATL or MFC class, definitely consider using CString to represent strings.

Standard STL Strings However, there are cases where it’s better to use a standard string class at the interface of custom-designed C++ classes that make up Windows applications. For example, you may want to abstract away the Win32 API layer as soon as possible in your C++ code, preferring the use of an STL string class instead of Windows-specific classes like CString at the public interface of custom-designed C++ classes. So, let’s consider the case of text stored in STL string classes. At this point, you need to pass those STL strings across Win32 API boundaries (which expose a pure-C interface, as discussed in the beginning of this article). With ATL, WTL and MFC, the framework implements the “glue” code between the Win32 C interface layer and CString, hiding it under the hood, but this convenience isn’t available with STL strings.

For the purpose of this article, let’s assume the strings are stored in Unicode UTF-16 format, which is the default Unicode encoding for Windows APIs. In fact, if those strings used another format (such as Unicode UTF-8), those could be converted to UTF-16 at the Win32 API boundary, satisfying the aforementioned requirement of this article. For these conversions, the Win32 MultiByteToWide­Char and WideCharToMultiByte functions could be used: The former can be called to convert from a Unicode UTF-8-encoded (“multi-byte”) string to a Unicode UTF-16 (“wide”) string; the latter can be used for the opposite conversion.

In Visual C++, the std::wstring type is well-suited to represent a Unicode UTF-16 string, because its underlying character type is wchar_t, which has a size of 16 bits in Visual C++, the exact size of a UTF-16 code unit. Note that on other platforms, such as GCC Linux, a wchar_t is 32 bits, so a std::wstring on those platforms would be well-suited to represent Unicode UTF-32-encoded text. To remove this ambiguity, a new standard string type was introduced in C++11: std::u16string. This is a specialization of the std::basic_string class with elements of type char16_t, that is, 16-bit character units.

The Input String Case

If a Win32 API expects a PCWSTR (or LPCWSTR in older terminology), that is, a const wchar_t* NUL-terminated C-style input string parameter, simply calling the std::wstring::c_str method will be just fine. In fact, this method returns a pointer to a read-only NUL-terminated C-style string.

For example, to set the text of a window’s titlebar or the text of a control using the content stored in a std::wstring, the SetWindowText Win32 API can be called like this:

Note that, while the ATL/MFC CString offers an implicit conversion to a raw character const pointer (const TCHAR*, which is equivalent to const wchar_t* in modern Unicode builds), STL strings do not offer such an implicit conversion. Instead, you must make an explicit call to the STL string’s c_str method. There’s a common understanding in modern C++ that implicit conversions tend to not be a good thing, so the designers of STL string classes opted for an explicitly callable c_str method. (You’ll find a related discussion on the lack of implicit conversion in modern STL smart pointers in the blog post at bit.ly/1d9AGT4.)

The Output String Case

Things become a little bit more complicated with output strings. The usual pattern consists of first calling a Win32 API to get the size of the destination buffer for the output string. This may or may not include the terminating NUL; the documentation of the particular Win32 API must be read for that purpose.

Then, a buffer of proper size is allocated dynamically by the caller. The size of that buffer is the size determined in the previous step.

And, finally, another call is made to a Win32 API to read the actual string content into the caller-allocated buffer.

For example, to retrieve the text of a control, the GetWindowTextLength API can be invoked to get the length, in wchar_ts, of the text string. (Note that, in this case, the returned length does notinclude the terminating NUL.)

Then, a string buffer can be allocated using that length. An option here could be to use a std::vector<wchar_t> to manage the string buffer, for example:

Note that this is simpler than using a raw “new wchar_t[bufferLength]” call, because that would require properly releasing the buffer with a call to delete[] (and forgetting to do that would cause a memory leak). Using std::vector is just simpler, even if using vector has a small overhead compared to a raw new[] call. In fact, in that case the std::vector’s destructor would automatically delete the allocated buffer.

This also helps in building exception-safe C++ code: If an exception is thrown somewhere in the code, the std::vector destructor would be automatically called. Instead, a buffer dynamically allocated with new[], whose pointer is stored in a raw owning pointer, would be leaked.

Another option, considered as an alternative to std::vector, might be the use of std::unique_ptr, in particular, std::unique_ptr< wchar_t[] >. This option has the automatic destruction (and exception-safety) of std::vector (thanks to std::unique_ptr’s destructor), as well as less overhead than std::vector, because std::unique_ptr is a very tiny C++ wrapper around a raw owning pointer. Basically, unique_ptr is an owning pointer protected within safe RAII boundaries. RAII (bit.ly/1AbSa6k) is a very common C++ programming idiom. If you’re unfamiliar with it, just think of RAII as an implementation technique that automatically calls delete[] on the wrapped pointer—for example, in unique_ptr’s destructor—releasing the associated resources and preventing memory leaks (and resource leaks, in general).

Then, once a buffer of proper size is allocated and ready for use, the GetWindowText API can be called, passing a pointer to that string buffer. To get a pointer to the beginning of the raw buffer managed by the std::vector, the std::vector::data method (bit.ly/1I3ytEA) can be used, like so:

std::wstring text(buffer.data()); // When buffer is a std::vector<wchar_t>
std::wstring text(buffer.get()); // When buffer is a std::unique_ptr<wchar_t[]>

In the preceding code snippet, I used a constructor overload of wstring, taking a constant raw wchar_t pointer to a NUL-terminated input string. This works just fine, because the called Win32 API will insert a NUL terminator in the destination string buffer provided by the caller.

As a form of slight optimization, if the length of the string (in wchar_ts) is known, a wstring constructor overload taking a pointer and a string character count parameter could be used instead. In this case, the string length is provided at the call site, and the wstring constructor doesn’t need to find it out (typically with an O(N) operation, like calling wcslen in a Visual C++ implementation).

A Shortcut for the Output Case: Working in Place with std::wstring

With regard to the technique of allocating a temporary string buffer using an std::vector (or an std::unique_ptr) and then deep copying it into a std::wstring, you could take a shortcut.

Basically, an instance of std::wstring could be used directly as a destination buffer to pass to Win32 APIs.

In fact, std::wstring has a resize method that can be used to build a string of proper size. Note that in this case, you don’t care about the actual initial content of the resized string, because it will be overwritten by the invoked Win32 API. Figure 1 contains a sample code snippet showing how to read strings in place using std::wstring.

A C++ programmer might be tempted to use the std::wstring::data method to access the internal string content, via a pointer to be passed to the GetWindowText call. But wstring::data returns a const pointer, which wouldn’t allow the content of the internal string buffer to be modified. And because GetWindowText expects write access to the content of the wstring, that call wouldn’t compile. So, an alternative is to use the &text[0] syntax to get the address of the beginning of the internal string buffer, to be passed as an output (that is, modifiable) string to the desired Win32 API.

Compared to the previous approach, this technique is more efficient because there’s no temporary std::vector, with a buffer first allocated, then deep copied into a std::wstring and, finally, discarded. In fact, in this case, the code just operates in place in a std::wstring instance.

Avoiding Bogus Double-NUL-Terminated Strings Pay attention to the last line of code in Figure 1:

With the initial wstring::resize call (text.resize(bufferLength);, without the “-1” correction), enough room is allocated in the internal wstring buffer to allow the GetWindowText Win32 API to scribble in its NUL terminator. However, in addition to this NUL terminator written by GetWindowText, std::wstring implicitly provides another NUL terminator. So, the resulting string ends up as a double-NUL-­terminated string: the NUL terminator written by GetWindowText, and the NUL terminator automatically added by wstring.

To fix this wrong double-NUL-terminated string, the wstring instance can be resized down to chop the NUL terminator added by the Win32 API off, leaving only wstring’s NUL terminator. This is the purpose of the text.resize(bufferLength-1) call.

Handling a Race Condition

Before concluding this article, it’s worth discussing how to handle a potential race condition that may arise with some APIs. For example, suppose you have code that’s reading some string value from the Windows Registry. Following the pattern showed in the previous section, a C++ programmer would first call the RegQuery­ValueEx function to get the length of the string value. Then, a buffer for the string would be allocated, and finally the RegQueryValueEx would be called a second time, to read the actual string value into the buffer created in the previous step.

The race condition that could arise in this case is another process modifying the string value between the two RegQueryValueEx calls. The string length returned by the first call could be a meaningless value, unrelated to the new string value written in the Registry by the other process. So, the second call to RegQueryValueEx would read the new string in a buffer allocated with a wrong size.

To fix that bug, you can use a coding pattern like the one in Figure 2.

The use of the while loop in Figure 2 ensures that the string is read in a buffer of proper length, because each time ERROR_MORE_DATA is returned, a new buffer is allocated with the proper bufferLength value until the API call succeeds (returning ERROR_SUCCESS) or fails for a reason other than providing an insufficiently sized buffer.

Note that the code snippet in Figure 2 is just example skeleton code; other Win32 APIs could use different error codes related to an insufficient buffer provided by the caller, for example, the ERROR_INSUFFICIENT_BUFFER code.

Wrapping Up

While the use of CString at the Win32 API boundary—with the help of frameworks like ATL/WTL and MFC—hides the mechanics of interoperation with the Win32 pure-C-interface layer, when using STL strings the C++ programmer must pay attention to certain details. In this article, I discussed some coding patterns for the interoperation of the STL wstring class and Win32 pure-C interface functions. In the input case, calling wstring’s c_str method is just fine to pass an input string at the Win32 C interface boundary, in the form of a simple constant (read-only) NUL-terminated string character pointer. For output strings, a temporary string buffer must be allocated by the caller. This can be achieved using either the std::vector STL class or, with slightly less overhead, the STL std::unique_ptr smart pointer template class. Another option is to use the wstring::resize method to allocate some room inside the string instance as a destination buffer for Win32 API functions. In this case, it’s important to specify enough space to allow the invoked Win32 API to scribble in its NUL terminator, and then resize down to chop that off and leave only wstring’s NUL terminator. Finally, I covered a potential race condition, and presented a sample coding pattern to solve that race condition.

Giovanni Dicaniois a computer programmer specializing in C++ and the Windows OS, a Pluralsight author and a Visual C++ MVP. Besides programming and course authoring, he enjoys helping others on forums and communities devoted to C++, and can be contacted at giovanni.dicanio@gmail.com.

Thanks to the following technical experts for reviewing this article: David Cravey (GlobalSCAPE) and Stephan T. Lavavej (Microsoft) David Cravey is an Enterprise Architect at GlobalSCAPE, leads several C++ user groups, and was a four time Visual C++ MVP.Stephan T. Lavavej is a Senior Developer at Microsoft. Since 2007, he's worked with Dinkumware to maintain Visual C++'s implementation of the C++ Standard Library. He also designed a couple of C++14 features: make_unique and the transparent operator functors.