Dynamic string handling (C)

Okay I've been trying to figure this out but without much luck. I'm trying to write a simple program that takes input and manipulates it. Easy and that is already done (I'm using C).

The problem is I don't want any hard coded limits on the length of the string in the program and all of the string functions seem to require an array of chars which is also fine. But I can't find a function which will just count the number of characters in a string and put the result into an int which I can then use to specify the size of the array of chars.

The only other way I can think of doing it would be to use getch() to get the characters one by one and then just have a simple counter while I am running a while loop or something. The problem is I still need to declare an array for getch to work which then adds a hardcoded limit to the length of the string my program will handle. Likewise strnlen() counts the length of a given string but it will only count it from an already existing buffer.

look at strlen to find the length of the string, see man strlen for more information.

Click to expand...

Yep, I already mentioned strlen in my oringinal post, but it is not dynamic.

Code:

int x = 0;
char buffer[x];
x = strnlen(buffer);

You can see the problem. I can't set the value of x without having a character buffer. If I declare a character buffer to then set the size of another character buffer, the program is limited to the size of the original character buffer.

If you need variable length strengths you'll need malloc. Malloc n+1*sizeof(char) characters. As you need an extra one for the NULL terminator.

Click to expand...

Not sure I follow you here. Would I use something like that instead of an array of chars? Or would I use malloc to set the size of the array. I have not looked into malloc and memory management in general much.

You use "malloc()" to allocate memory dynamically (ie at run time), and not at compile time.

Instead of doing something like this:

char buffer[100];

do this:

char *buffer;

buffer=malloc(100);

This makes buffer point to a 100 byte memory segment - the end result is the same, but it was assigned dynamically, and not at compile time.

Malloc is usually used with free(), which returns the memory used by malloc when you are done with it. Just make sure you are really done with it before you call it like htis:

free(buffer);

Once you do this, you can't use the memory referenced by buffer until you do another malloc.

As you can imaging, managing malloc/free is a bit tricky and a common source of errors. If you have ever heard of a "memory leak", that comes from doing malloc() without free() to give the memory back.

...But I can't find a function which will just count the number of characters in a string and put the result into an int which I can then use to specify the size of the array of chars.

Click to expand...

strlen() does that, as Erasehead points out.

I think there's a terminology thing going on. A "string" in C is defined as a sequence of character that end in a binary zero. If you are indeed accepting a string into your program, you can indeed use strlen().

If, however, you are reading a character at a time from stdin (which your mention of getch() implies), and you don't want to constrain yourself with any limits on length of the string, then that's fine - you can still do that.

You have a few choices, but the first, and probably easiest, is to declare a hardcoded array of some really big and reasonable length - maybe 1,001 characters. Then, start your getch()'ing. When you get to 1000, and you're not done, then do a malloc() for 2001 bytes. Copy your data over into the malloc'ed area, and keep getch()ing. If you fill up, do it again, free()'ing the first malloc()'ed area as well.

(Look up realloc() as well - might save you a few steps if your malloc'ed area becomes too small)

When you get what you determine is the last character, put a binary zero on the end of the array and you've created a "string" for yourself.

I think there's a terminology thing going on. A "string" in C is defined as a sequence of character that end in a binary zero. If you are indeed accepting a string into your program, you can indeed use strlen().

Click to expand...

You misunderstand me. I am well aware of strlen() and its uses. I could already implement the following code to deal with this :

Code:

int x;
char buffer[100];
gets(buffer);
x = strlen(buffer);

BUT, the problem with that code is that the size of the string is limited to the size of the array buffer. I do not want any hard coded limit on the size of my string.

Therefore strlen() is not an option as it requires a buffer to be declared before you can use it. Therefore you need to set an arbitary size of the buffer in advance which means that you already have a hard coded limit on the size of the string that your program can accept. Setting the size of the buffer to a stupidly large number is not an option.

If, however, you are reading a character at a time from stdin (which your mention of getch() implies), and you don't want to constrain yourself with any limits on length of the string, then that's fine - you can still do that.

You have a few choices, but the first, and probably easiest, is to declare a hardcoded array of some really big and reasonable length - maybe 1,001 characters. Then, start your getch()'ing. When you get to 1000, and you're not done, then do a malloc() for 2001 bytes. Copy your data over into the malloc'ed area, and keep getch()ing. If you fill up, do it again, free()'ing the first malloc()'ed area as well.

(Look up realloc() as well - might save you a few steps if your malloc'ed area becomes too small)

When you get what you determine is the last character, put a binary zero on the end of the array and you've created a "string" for yourself.

I usually declare strings as some sensible size using malloc() and then as I close in to that size I realloc() the string to something bigger, say +1000, and then once I have all the data in I realloc() it one more time to set the string to the length I want. Allocating a string to be quite big saves you realloc'ing a lot because each time you do a realloc() you are copying the contents of the original array into a new location so once strings get large this can have a detrimental effect on performance so try and do it as few times as possible within a loop. Also, rather than using strlen() each time to keep an eye on the size, store the length in an int. Just use strlen() for the final realloc() to set the size to the exact length needed. OK, so maybe that is teaching you to suck eggs but it is good practice so worth mentioning.

Oh, and always remember to free() the memory when you don't need it anymore. C doesn't do garbage collection like Java so your memory management has to be meticulous, especially when you are doing dynamic allocation on a large scale. Many of my programs do biological sequence comparison so I allocate and free a lot of string arrays.

You have to set length to anything before using it. It's value will either be 0 or whatever was in it before (I forget whether C guarantees new variables are zeroed). Either way this is not a good idea.

You have to set length to anything before using it. It's value will either be 0 or whatever was in it before (I forget whether C guarantees new variables are zeroed). Either way this is not a good idea.

Click to expand...

Ah I see. Thanks for that, I always forget to initialise my variables.

When you say this is not a good idea, do you mean not initialising variables or do you mean the entire method?

I believe Robbie meant it's not a good idea to not initialize variables.

Another thing that's not a good idea is to use gets(). gets() will get a string of any length. It's the perfect candidate for enabling buffer overruns and such. If you ran your program, and the user copy/pasted in their input, it could easily cause your malloc()ed storage to be overrun, causing a crash or other undesired affect. http://www.cppreference.com/stdio/gets.html

If fgets() does recognize end-of-string, it will append the newline character to the data followed by the null-term character. Otherwise, it will only append the null-term character, and you can determine that end-of-string has not been reached and do your whole realloc() thing.

Finally, one last comment on your progress so far. It's generally accepted today to concern yourself with unicode. Therefore, your malloc() should take this into account. So, multiply the length you are requested by the size of the data type, like this:

As a general rule, be very careful with realloc. In particular don't do this:

p = realloc(p,nbytes)

What if realloc returns null? Then you wind up wiping out your original pointer p. Even if it doesn't return null, you also have to be mindful of updating references to the original malloc'd block should realloc move a chunk of that memory to a new location.

MacRumors attracts a broad audience
of both consumers and professionals interested in
the latest technologies and products. We also boast an active community focused on
purchasing decisions and technical aspects of the iPhone, iPod, iPad, and Mac platforms.