New string functions

This is a discussion on New string functions within the C Programming forums, part of the General Programming Boards category; I took the liberty of creating some new string functions for C.
The goal with these was:
- Safety. No ...

New string functions

I took the liberty of creating some new string functions for C.
The goal with these was:
- Safety. No buffer overflows.
- If there's not enough room in the buffer to complete the operation, return the space needed.
- The ability to use C++-like strings. It supports dynamically allocated strings that will handle memory management to make sure there's enough room for operations.
- Performance. Instead of using the old C-style way of using null-terminators for getting length, it keeps track of the length itself, leading to speedups (albeit slightly more memory use).
- Slight C++-compatibility (such as constructors, destructors and an iostream overload). Nothing more.

So here is the code:Snip: Deleted; new code below.

And now for criticism, please.
The functions can handle non-dynamic strings, in which case it will fail if no space is available. The 3rd parameter allows the caller to acquire the needed space for the operation to complete.
The strstr and strchr functions returns the position from the beginning of the string that the match was found. This is because returning a char* pointer to a dynamic string could lead to disastrous results. Therefore, to protect against mistakes, the position is returned and a char* pointer could be acquired from the original string, if required.

Last edited by Elysia; 03-28-2009 at 05:46 AM.
Reason: Saving space for new code

i can't count the times i've tried to come up with a generic string library for c that's better than doing things manually. more power to you if you can pull it off, but if you're going to write a library for c, it should compile as c. yours doesn't.

- I think your CopyStrInt is way too complicated. Using an internal function to append and copy is fine, but the signature takes too many parameters. It shouldn't require more than a SStr*, a const SStr*, and perhaps another auxiliary parameter, while modyfing the mutable SStr.

- By using strcpy, strchr, strstr, etc, you are explicitly negating the possibility of using embedded \0's in the string. Furthermore, since you have the length of the strings, the functions memcpy, memmove and memchr are faster and let you use nil characters.

- You are forgetting to check for NULL in essentially every function. strlen, for example, will most likely crash if you pass it a null pointer.

- IsEqual should negate the return value of strcmp, since strcmp returns zero on equal strings. You should use memcmp, anyway.

There are many C libraries for strings out there. My personal favourite is bstring, but I've seen others use Glib. I recommend taking a look at the already existing ones before making one yourself.

- I think your CopyStrInt is way too complicated. Using an internal function to append and copy is fine, but the signature takes too many parameters. It shouldn't require more than a SStr*, a const SStr*, and perhaps another auxiliary parameter, while modyfing the mutable SStr.

Well, the idea is that it will return the required space for the buffer to complete the operation if the string isn't dynamic and there's not enough space.
Since C doesn't support default arguments (at least C90 AFAIK), there's no way to exclude it.
But if you have an idea...

- By using strcpy, strchr, strstr, etc, you are explicitly negating the possibility of using embedded \0's in the string. Furthermore, since you have the length of the strings, the functions memcpy, memmove and memchr are faster and let you use nil characters.

I figured that one out. Why am I using strcpy which will use strlen to find the length?
I actually changed to memcpy.
But embedded \0... well, that might take a little more work. But it should be entirely doable.

- You are forgetting to check for NULL in essentially every function. strlen, for example, will most likely crash if you pass it a null pointer.

That is good criticism. I wonder why the code analyzer doesn't detect that?

- IsEqual should negate the return value of strcmp, since strcmp returns zero on equal strings. You should use memcmp, anyway.

Removed IsEqual and made CmpStr instead... I don't see the point of IsEqual really, since it should wrap the C-functions.

There are many C libraries for strings out there. My personal favourite is bstring, but I've seen others use Glib. I recommend taking a look at the already existing ones before making one yourself.

While that's very nice and all, this is just a small side project that I do for fun.
Not really intending to compete with the "big" players out there.

As my current code, I have actually transformed the C code into C++ code and packed it into a library, exposing a C interface.
This has numerous advantages since C++ offers a lot more functionality to use and abuse, yet providing a C interface so the code can be designed--and work--with C.
I should probably post that code in the C++ forum...

It's utility as SStr is short lived I think. You'll be converting to a normal C string a lot and the way you do that is performance intensive, and sloppy.

It's very frustrating to actually use unless you convert to C strings almost immediately.

Creating them might be troublesome, but C doesn't support exceptions, so I don't know how else to do it. Should it return the created string and have an extra argument for error return?

Plus there is a problem in that code. Pointer arithmetic on NULL should be undefined: I'm surprised you apparently didn't look at the manual to understand exactly what string.h already does.

The code lacks checking for NULL pointers, true, but I've fixed that in my current code.

The strstr and strchr functions can--and probably will be--changed to more efficient implementations later. It would mean designing new code for finding strings without relying on strlen.
Aside from that, I don't see where the biggest problems lie - you fail to mention.

> Creating them might be troublesome, but C doesn't support exceptions, so I don't know how
> else to do it. Should it return the created string and have an extra argument for error return?

I've been misunderstood. It's very hard to use your library with anything standard because you don't maintain terminating zeros. The thing that does return a normal C string, NewDStr, is poorly named and really performance intensive.

I should be able to use some function that returns the buffer because I need to read it, and foo->Str elsewhere I guess. I still would be concerned about copies of the same variable (specifically the ones that are a result of struct assignment, which is a shallow copy).

Of course, another option is just to make your library do more, but you wanted opinions now. Before you so much as used a test harness...

> The code lacks checking for NULL pointers, true, but I've fixed that in my current code.

Stop farting around and actually read the manual one of these days! The strstr and strchr functions return NULL if the needle is not in the haystack, and you do pointer arithmetic to compute your search results. Plus they also require zero terminated strings.

> Creating them might be troublesome, but C doesn't support exceptions, so I don't know how
> else to do it. Should it return the created string and have an extra argument for error return?

I've been misunderstood. It's very hard to use your library with anything standard because you don't maintain terminating zeros. The thing that does return a normal C string, NewDStr, is poorly named and really performance intensive.

Oh no, you misunderstand. It does maintain null-terminating zeroes. It just doesn't use strlen to get the length.
NewDString / NewStr returns a new SStr from a C-style string.

I should be able to use some function that returns the buffer because I need to read it, and foo->Str elsewhere I guess.

I am uncertain what you mean here...

I still would be concerned about copies of the same variable (specifically the ones that are a result of struct assignment, which is a shallow copy).

I suspect that playing around with pointers, like C-strings, might help with that.
I took another approach--I abstracted the data type much like Windows API or any other API, hiding the true type and making NewXStr return a pointer to an "unknown type" instead, which you can make copies of pass around.

Of course, another option is just to make your library do more, but you wanted opinions now. Before you so much as used a test harness...

Of course I made sure the code was working, but... I'm not so much a C programmer. I will probably have very little use for these.

> The code lacks checking for NULL pointers, true, but I've fixed that in my current code.

Stop farting around and actually read the manual one of these days! The strstr and strchr functions return NULL if the needle is not in the haystack, and you do pointer arithmetic to compute your search results. Plus they also require zero terminated strings.

Hey now, of course I do read manuals.
Though I think I misunderstood your first criticism.
First, they ARE zero-terminated.
And secondly, I have fixed that little problem now.

Originally Posted by zacs7

Why not use size_t instead of unsigned int? :\
At least for one it matches the standard functions.

You might run into problems with C99 as bool is a standard type (see stdbool.h). Why not just check if bool is defined? :\

I did not know that bool was a define, that is why I asked if there was a macro to see if the code was compiling as C99...
So bool is a define... I added guards for that, too.
(Visual Studio doesn't have stdbool.h.)

I did not know that bool was a define, that is why I asked if there was a macro to see if the code was compiling as C99...
So bool is a define... I added guards for that, too.
(Visual Studio doesn't have stdbool.h.)

That's because Visual Studio is not C99.

Problem you have is that

1) bool is not necessarily defined in C89 - or, if it is, it is often defined by user code so may have meanings incompatible with yours.
2) It is a keyword in C++, so cannot be #define'd
3) It is potentially defined, by <stdbool.h> in C99.
4) Most C++ compilers are also C89 compilers.
5) Some C89 compilers are not C++ compilers.
6) Some (relatively few) C++ compilers are C99 compilers.
7) Several C99 compilers are not C++ compilers.

Unfortunately, the treatment of bool is one of the areas of incompatibility the C99 standard introduced with C++.

A way to unambiguously detect a C99 compilation, incidentally, is to look for the predefined macro __STDC_VERSION__. For a C99 compilation it will have the value 199901L (the value will be increased for future versions of the C standard). C89 and C++ compilers - or, more accurately, their preprocessors - will not define that macro (unless they are also C99 compilers).

If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

Hopefully, I've taken care of a lot of criticism.
I actually have added more than just string functions, but they do work with strings. The new improved fgets, for example, that will now read everything in the buffer, not leaving anything in there and automatically strips the newline.