strtok safety

This is a discussion on strtok safety within the C++ Programming forums, part of the General Programming Boards category; how safe is strtok when I get the string from std::string::c_str()? I'd imagine it's considerably safer than if I was ...

strtok safety

how safe is strtok when I get the string from std::string::c_str()? I'd imagine it's considerably safer than if I was dealing with pointers from other origins, but I just figured I'd see what you guys thought. strtok is a lot faster than other methods which use the functionality of std::string or std::stringstream.

If you really want strtok for strings you could roll your own, and I don't think it would be any slower than strtok necessarily, since you would have to do things like make a copy of the string, anyway.

The reason why strtok would be faster is that it doesn't create new strings and instead puts null-terminators in the existing one and returns pointers to substrings.

I think boost::split can store the results as a collection of iterator_ranges without copying the substrings. You might try that or use a similar idea. (Because you get the beginning and end of the substrings, rather than a single pointer to a C-string, you won't need to modify the string at all.)

The reason why strtok would be faster is that it doesn't create new strings and instead puts null-terminators in the existing one and returns pointers to substrings.

and in my case, I'm copying the returned strings into elements of a std::vector<std::string>. Since the original string, returned from std::string::c_str(), is basically guaranteed to be constructed sanely, and the returned value from strtok() is basically guaranteed to be a sanely constructed substring of that original string, I would think that this method would be reasonably safe under most circumstances.

Other methods would be a lot 'faster' to write and your code would be up and running by now, rather than fiddling with strtok to make it work.

the code has in fact been up and running for almost two years now, and has been working just fine. I just want it to be faster.

WHEN your program is finished, and you've PROVEN this is a bottleneck with a profiler, THEN you can think about performance tuning.
All the tokenising approaches are fast, compared to say file I/O.

it is a bottleneck. I wouldn't be asking these questions if it wasn't an issue.

doing away with the const_cast, and first copying the string to a buffer on the heap would obviously be safer, and copying a block of memory is relatively fast, so there wouldn't be big performance hit there, and I'm thinking that so long as I check for errors allocating memory, and free it when I'm done, it should be only slightly less safe than std::string and its friends. As long a strtok() and strcpy(), or for that matter, strncpy(), don't misbehave, it should be fine.

I'd imagine it's guaranteed that &str[0] will be the null terminated string. Or doesn't the data have to be sequential/zero terminated until you call c_str() (in which case, of course, only the returned value is sequential and zero terminated, and &str[0] still might not be).

I'd imagine it's guaranteed that &str[0] will be the null terminated string. Or doesn't the data have to be sequential/zero terminated until you call c_str() (in which case, of course, only the returned value is sequential and zero terminated, and &str[0] still might not be).

Anybody knows whether it's legal?

At the moment, there is no such definite guarantee, though due to a defect in the standard the wording could be regarded as ambiguous. This defect will be rectified in C++0x.

EDIT:
Oh, but if I remember correctly the guarantee will be that the storage is contiguous. I do not think that there will be any guarantee of null termination (such a change would not make sense), so you would have to add a null character to the end in order to use &str[0] with strtok().

you would have to add a null character to the end in order to use &str[0] with strtok().

which brings me back to using strncpy() to copy to a buffer allocated on the heap, and when I run a loop ahead of the strtok() call to determine the number of elements (I'm only using a single character as a delimiter anyway), and reserve space in the vector in which the results are stored, the difference in performance is negligible between strtok() and std::getline().