I keep running into code that uses strcpy, sprintf, strncpy, _snprintf (Microsoft only), wcsncpy, swprintf, and morally equivalent functions. Please stop. There are alternatives which are far safer, and they actually require less typing.

The focus of this post is on fixed-size string buffers, but the technique applies to any type of fixed-length buffer. If you don’t use fixed-size buffers then this post is not relevant to you. Some people do use them, sometimes for valid reasons, and this post is for them.

The dangers of strcpy and sprintf should require little explanation I hope. Neither function lets you specify the size of the output buffer so buffer overruns are often a risk. Using strcpy to copy data from network packets or to copy a large array into a smaller one are particularly dangerous, but even when you’re certain that the string will fit, it’s really not worth the risk.

‘n’ functions considered dangerous

The dangers of strncpy, _snprintf, and wcsncpy should be equally well known, but apparently they are not. These functions let you specify the size of the buffer but – and this is really important – they do not guarantee null-termination. If you ask these functions to write more characters than will fill the buffer then they will stop – thus avoiding the buffer overrun – but they will not null-terminate the buffer. In order to use these functions correctly you have to do this sort of nonsense.

A non-terminated string in C/C++ is a time-bomb just waiting to destroy your code. My understanding is that strncpy was designed for inserting text into the middle of strings, and then was repurposed for ‘secure’ coding even though it is a terrible fit. Meanwhile _snprintf followed the strncpy pattern, but snprintf did not. That is, snprintf guarantees null-termination, but strncpy and _snprintf do not. Is it any wonder that developers get confused? Is it any wonder that developers often do this:

// Make snprintf available on Windows:// Don’t ever do this! These two functions are different!#define snprintf _snprintf

strlcpy and lstrcpy

strlcpy is designed to solve the null-termination problems – it always null-terminates. It’s certainly an improvement over strncpy, however it isn’t natively available in VC++.

lstrcpy is a similarly named Microsoft design defect that appears to behave like strlcpy but is actually a security bug. It uses structured exception handling to catch access violations and then return, so in some situations it will cover up crashes and give you a non-terminated buffer. Awesome.

Wide characters worse?

swprintf is a function that defies prediction. It lacks an ‘n’ in its name but it takes a character count, however it doesn’t guarantee null-termination. It’s enough to make one’s head explode.

Where’s the pattern?

If you find the list below obvious or easy to remember then you may be a prodigy, or a liar:

The documentation for these functions (man pages, MSDN) is typically pretty weak. I want bold letters at the top telling me whether it will null-terminate, but it typically takes a lot of very careful reading to be sure. It’s usually faster to write a test program.

It’s also worth emphasizing that of the seven functions listed above only one is even plausibly safe to use. And it’s not great either.

More typing means more errors

But wait, it’s actually worse. Because it turns out that programmers are imperfect human beings, and therefore programmers sometimes pass the wrong buffer size. Not often – probably not much more than one percent of the time – but these mistakes definitely happen, and ‘being careful’ doesn’t actually help. I’ve seen developers pass hard-coded constants (the wrong ones), pass named constants (the wrong ones), used sizeof(the wrong buffer), or use sizeof on a wchar_t array (thus getting a byte count instead of character count). I even saw one piece of code where the address of the string was passed instead of the size, and due to a mixture of templates and casting it actually compiled! Passing sizeof() to a function that expects a character count is the most common failure, but they all happen – even snprintf and strlcpy are misused. Using annotations and /analyze can help catch these problems, but we can do so much better.

The solution is…

We are programmers, are we not? If the functions we are given to deal with strings are difficult to use correctly then we should write new ones. And it turns out it is easy. Here I present to you the safest way to copy a string to an array:

The syntax is a bit weird since it combines templating on an integral value (instead of a type) with passing an array by reference, both of which are unfamiliar to many programmers. For more information on passing arrays by reference see this stack overflow article. Or, you can use the template magic quite effectively without worrying about the details of how it works.

<note>Commenters correctly pointed out that strncpy followed by null-termination is not the ideal implementation of strcpy_safe because it is inefficient (strncpy will zero all bytes to the end of the buffer) and maybe cut UTF-8 characters in half. Fixing this is beyond the scope of this post, which was supposed to be focused on automatically inferring the buffer size through template magic. So, don’t forget to implement YourCopyNFunction – and maybe I’ll post a version next time.</note>

I challenge you to use this function incorrectly. You can make it crash by passing an invalid source pointer, but in many years of using this technique I have never seen a case where the buffer size was not inferred correctly. If you pass a pointer as the destination then, because the size cannot be inferred, the code will fail to compile. It only works with a static string buffer as destination (no std::string, or std::vector) but different overloads can be made for those destination types.

I think that strcpy_safe is (ahem) a perfect function. It is either used correctly, or it fails to compile. It. Is Perfect. And it’s only six lines. Five if you indent like K&R.

Because strcpy_safe is so tiny – it just calls strncpy and then stores a zero – it will inline automatically in VC++ and gcc in optimized builds. If you want to reduce code size further you could write a non-inline helper function (strlcpy?) that would do the null-termination and have strcpy_safe call this function. It’s up to you.

One could certainly debate the name – maybe you would rather call it acme_strcpy, or acme_strncpy_safe. I really don’t care. You could even call it strcpy and let template overloading magically improve the safety of your code.

Unicode

String truncation causes problems with UTF-8 encoding. If you want to truncate at a character (or is that code-point – I can’t remember) boundaries then need to add some extra code to scan backwards to a character boundary. It’s not too complicated, but it’s outside the scope of this post which was focused on using templates to infer array sizes.

Extrapolation

Similar wrappers can obviously be made for all of the string functions that you use. You can even invent new ones, like sprintf_cat_safe. In fact, when I write a member function that takes a pointer and a size I usually make it private and write a template wrapper to handle the size. It’s a versatile technique that you should get used to using. Templates aren’t just for writing unreadable meta-code.

String classes

Yes, to be clear, I am aware of the existence of std::string. For better or for worse most game developers try to avoid dynamically allocated memory and std::string generally implies just that. There are valid reasons to use character arrays (fewer allocations, better cache locality), even if those valid reasons are just that you’ve been handed a million lines of legacy code with security and reliability problems in all directions. strcpy_safe and friends are unique in that they let you improve code security and reliability with a simple s/strcpy/strcpy_safe/.

As I say at the top, if you don’t need to use fixed-length buffers then congratulations, this post does not apply to you.

You should be using the second option, or explain to me why the heck not. It is constructive laziness of the highest order. Doing less typing in order to create code that is less fragile seems like it just might be a good idea.

One could certainly argue that silently truncating is bad – if that is your policy then your wrapper function can assert, log, or abort(). Those are implementation specific details. However we can all agree that avoiding buffer overruns is progress.

Share this:

Like this:

LikeLoading...

Related

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs.
I also unicycle. And play (ice) hockey. And juggle.

60 Responses to Stop using strncpy already!

Well, your strcpy_safe is very nice, I see only two issues here:
– strncpy and friends usually used in c programs not in c++, so if someone still uses strncpy in c++… well, they must not be c++ programmers, but c programmer.
– the second thing, which I’m not sure about, that using template means that it will “generate” one function for every different kind of calls, isn’t it ? So if used heavily the code base can grow significantly ?!

C++ programmers still have valid uses for string arrays and they need to use something to fill them — and the standard library doesn’t have great options. What would you have them use?

Code bloat shouldn’t be an issue — the template function will inline and be no more expensive than doing the strncpy/manual-termination dance. If the template function called strlcpy or some equivalent then it would be perfectly efficient. I updated the post to address that issue — thanks for mentioning it.

strcpy_safe and strcpy_safe seems to be two different function for me. I’m just wondering if you can write something without templates, to workaround the size question… Funny that sometimes “trivial” problems are’t trivial. :)

Recompile with optimisations. Unoptimised C++ code is really different from even minimally optimised one; the trick of declaring a public copy constructor without defining it, and just depend on (N)RVO, tends to break, for instance. On both clang++ 4.0 and g++ 4.2, plain -O suffices to inline the template functions, and replace them with a single-pass initialisation from literals. In fact, the functions are recognized as pure and the issue is to prevent the buffers from being completely elided. (passing them to a dummy extern function works).

I forgot about strlcpy. It does guarantee null-termination, but it can still be misused by passing the wrong buffer size. On platforms that have strlcpy you should get strcpy_safe to call it. I’ve updated the post. Thanks.

(edited to fix critical error — I wrote “doesn’t” when I meant “does”)

That’s correct. That is a feature, not a bug. Automatic size inference won’t work for pointers so the code fails to compile so the developer has to manually solve that situation. Make the easy stuff easy, and save the effort for the trickier stuff.

Also, if you have a pointer you presumably have allocated memory, so use std::string.

strcpy_safe is also badly designed because it truncates but gives no way to detect truncation. Truncation is a *bug*, and any string routine that truncates without informing the caller is simply broken. Truncating and informing is (barely) acceptable, although awfully primitive.

If dynamic allocation is a concern, I would prefer a string library that backs strings with user-sizable static buffers, falling back on dynamic allocation only when necessary. Hopefully the tunable buffers would allow dynamic allocation to be avoided in enough cases that performance would be acceptable.

But if you really must resort throwing away part of the information the caller wanted, for God’s sake don’t do it silently.

Anonymous: zero sized arrays are illegal in C++. It might be wise to guard against that case anyway though (ie, gcc allows it as an extension).

I guess terminating and then using strncat is another way of implementing strcpy_safe, but it is certainly to inconvenient for using in leaf code. Any string copying solution that requires two lines of code and manual specification of the buffer size is dead-on-arrival.

uhm, stop using c strings or std::strings in your case is more like it mate, use bstring (http://bstring.sourceforge.net/) or facebook string(https://github.com/facebook/folly/blob/master/folly/docs/FBString.md#follyfbstringh), write your own string lib or what ever… but stop using null terminated.
Please do “#define byte char” as a gesture of what you intend on using char for. and strart thinking about chars in that term… c strings in most plain terms are not good for unicode, hey even something like wchar.h would make the situation better, so you don’t get internationalization problems like turkish keyboard people crashing your apps… so stop using std::string is more like it

Good point about UNICODE — adding proper UTF-8 truncation support to strcpy_safe would be a good idea. I’ve decided not to do it because it would complicate the article but I’ve added a note pointing out that issue.

strcpy_s has two potential disadvantages. One is the lack of portability, and the other is that strcpy_s’s policy is to terminate your application if the string won’t fit. To get truncation you need to use strncpy_s with the _TRUNCATE flag, and I think that truncation needs to be easier than that.

On the other hand, strcpy_s is what taught me the template array-size-inference trick.

“lstrcpy is a similarly named Microsoft design defect that appears to behave like strlcpy but is actually a security bug”. Yes, but it comes from 16-bit Windows (i.e. 1992 or even earlier), and its behaviour is retained for compatibility, so that’s a bit unfair.

Well, the lstrcpy design was broken when it was first written, so they have to take some responsibility for it. When they moved to 32-bit they could have removed it or fixed the design without fear of compatibility problems, but they didn’t. When they moved to 64-bit they could have removed it or fixed the design without fear of compatibility problems, but they didn’t.

Anytime that developers have to recompile their code you should take that as an opportunity to fix broken designs, so I don’t think it’s totally unfair.

And, fair or not, it is true that it is a security bug function, and developers need to know that. Mincing words would do nobody any good.

updated:
I used strncpy just to keep the implementation minimal, but your code is conceptually reasonable, but it doesn’t appear to guarantee null-termination, and it also fails to deal with UTF-8. Strings are hard.

I should have mentioned strcpy_s — I have in the past — but I was trying to keep the article as short as possible. Also, I find that a lot of developers don’t like the default termination-on-overflow behavior of strcpy_s, and invoking truncation takes away much of the elegance. And it’s not portable, regrettably.

I don’t think I singled out Microsoft for criticsm particularly — swprintf is a POSIX variant I believe and it is particularly odd.

nice post. Actually struggling with some legacy code with that problem. It’s in pure C though, so can unfortunately not use your template trick. Will however create a strcpy_safe function to make sure it’s at least prettier.

I prefer reserving the _safe function for functions that infer the buffer size. Perhaps better to use some other suffix or prefix? We’re considering going with _ptr for when you have a pointer (or otherwise have to specify the size) and _safe for when the size is inferred.

If I recall correctly, the reason why strncpy() is so stupid isn’t that it was meant for inserting in the middle of strings. That wouldn’t explain the other dumb thing it does, i.e. fill the remainder of the destination with zeroes. No, it’s much more horrible than that. Back in the stone age, there was this filesystem with fixed-size file names. The open() function expected a 12-character buffer (or so), and it compared all the characters in that buffer against the file name on disk (no null terminator, no name length field). So, in order to expose variable-length file names to the user, they used the convention that buffers had to be padded with 0, and they made a little function to help you do this: strncpy(). When you wanted to open a file, you did this:

strncpy(buffer, fileName, 11);
int fd = open(buffer);

I think it’s insulting that someone took this hack and put it into the standard library. Then again, the standard library contains strcpy() and sprintf(), so…

Also, I think people should stop mentioning the _s functions that Microsoft inflicted upon the world. None of the programmers who told me to use them actually knew what they do, because nobody bothers clicking on the inconspicuous “Parameter Validation” link in the documentation. Almost everybody thinks they are just the old functions with an added size parameter, but as you said, they aren’t. I’ve seen plenty of people use them on user input, expecting truncation and saying how they avoided a crash (without installing an invalid parameter handler, of course, because THEY DON’T KNOW ABOUT IT). It’s funny to see the feeling of betrayal which they get when you make them actually test the code with an oversize buffer, and it crashes (granted, a “secure” crash, not an exploitable buffer overflow, but still a denial of service).

Whatever you do, don’t install an invalid parameter handler. That seems like a really bad way to use the _s functions.

When I was at Microsoft I tried to convince them that they needed to make the truncation policy more accessible. Having to pass an extra parameter makes it too inconvenient, it should just be different function names — _st suffix instead of _s?

> The documentation for these functions (man pages, MSDN) is typically pretty weak. I want bold letters at the top telling me whether it will null-terminate, but it typically takes a lot of very careful reading to be sure.

So C11 (9899:2011) gives specifications for most or all of the ‘_s’ variants of the string functions, they are part of the official C11 standard but only as the set of optional extensions represented by the conditionally defined macro __STDC_LIB_EXT1__ … though the ‘_s’ versions of the string functions are not part of C++11 (14882:2011). I guess that would mean anyone wanting to know the exact behavior of the ‘_s’ version of a string function could reference the C11 specification, though there is no guarantee the Microsoft version of the function will behave the same as the C11 standard version (since the MSVC compiler is not C11 compliant).

I checked the C++14 draft and couldn’t find references to the ‘_s’ functions — where are they specified?

I think it would be good to have them widely available. They aren’t perfect, but being able to write “strcpy_s(buffer, pEvil);” and be safe from buffer overruns, or “strncpy_s(buffer, pEvil, _TRUNCATE);” to be safe with truncation would be great. Until then I’ll stick with my own template versions, which are at least simple enough to write.

This is why I’m so eagerly waiting for string_view and array_view which are coming in the next revision of the C++ standard. Passing around pointer/int pairs (or equivalently pointer/pointer) is asking for trouble and should be banned. I try to avoid it at all costs. Instead its better to wrap the invariant relationship within a type. Its pretty easy to write simple versions of these yourself.

I like it. This lets you put the translation from array/vector/string/whatever in one place and then have all of your templated functions use this universal definition. So, if you add support for a new type then all of the template strcpy_safe style functions get support for the new type for free.

The crucial thing is to not be passing the size manually, but avoiding that cleanly is certainly a great thing.

While str5cpy is better than strcpy and strncpy, it is still a terrible solution.

Any solution that requires the programmer to specify the size of the destination array when the compiler could (through template magic) infer that size is more verbose and error prone than it needs to be.

Str5 functions are intended to be used with C89 compliant compilers (like gcc), not C++. These functions were designed to be safer and easier to manipulate than str/strn/strl functions in the same contexts of use.
From my point of view, it’s not the specification of the buffer size which is error prone but the calculation of the number of char to be copied: sometimes it’s not the same value (substring copy): strn/strl functions suggest that they are able to copy all or the first bytes of a string but they only use one parameter.
Another problem with strl functions is the possible silent truncation.

“Manually specifying buffer sizes is more work to read”: auto-documented code is better for maintainability. Checking the return value of a function is also an extra work but produces a more robust code.