On 21 Jun 2006 08:53:37 +0200, Andreas Hauser <andy@splashground.de> wrote:
>
> dillon wrote @ Tue, 20 Jun 2006 23:27:14 -0700 (PDT):
> >
> > :andy wrote @ 20 Jun 2006 21:44:17 +0200:
> > :
> > :Fixed the off-by-one:
> > :http://ftp.fortunaty.net/DragonFly/inofficial/patches/strndup.patch
> >
> > Umm. That code is broken. len is only the maximum allowed length,
> > the actual string may be smaller.
> >
> > so e.g. someone might do: strndup("fubar", 16384). The returned
> > string should only be 'fubar\0', and only 6 bytes should be allocated,
> > not 16384.
>
> But when it works like that, one does not save the strlen.
> Hence i see the dislike for the function.
> I would like to have one, that does not work like that.
> Is there already a name for it?

Why not call it memdup instead and drop the termination? String
functions for standard C, as broken as they are, are all based around
having a null terminator, and in your case you're actually basing
entirely off a length (but allocating for length + 1 which is very
counter-intuitive). Not that this function really achieves anything to
begin with...

I never cared for C-style strings. To set a length for them you have
to modify them, and this means you have to re-allocate if doing
read-only tokenizing or regex extraction. In my own code I define a
structure containing a length and a pointer, and when extracting
sub-strings, simply set up such a structure defining the scope of the
sub-string. If it needs to be copied out for safe writing, it's
trivial to do, and at no point is there a need to check through for a
null terminator. If the structure itself is on the stack you don't
even need to malloc. The whole thing translates nicely into any kind
of memory usage, and works naturally with buffering data blocks since
you already know the length. Additional plus to being able to store 0
as a valid byte, which apparently matters for some encodings.

Proof of concept implemented as a header of static inline functions.
BSD license, C99, should be WARNS6 clean too. This will probably solve
your problem a lot better than yet another broken string function.

:..
:> Umm. That code is broken. len is only the maximum allowed length,
:> the actual string may be smaller.
:>
:> so e.g. someone might do: strndup("fubar", 16384). The returned
:> string should only be 'fubar\0', and only 6 bytes should be allocated,
:> not 16384.
:
:But when it works like that, one does not save the strlen.
:Hence i see the dislike for the function.
:I would like to have one, that does not work like that.
:Is there already a name for it?
:
:--
:Andy

You don't save the strlen no matter what. It's a string function.
If you want to call it 'strndup' then it has to be compatible with
the linux strndup() and strndup()'s implementations on other platforms.

If it isn't taking the length of the string into account, it isn't a
string function and it shouldn't be called 'str*'.

In anycase, I wouldn't worry about the strlen(). We are talking
a few nanoseconds... maybe 10-20ns for most strings, and strndup()
is doing a malloc() anyway which is MUCH more expensive then strlen().
Don't try to over-optimize the functionality at the cost of creating
obfuscated code!

::But when it works like that, one does not save the strlen.
::Hence i see the dislike for the function.
::I would like to have one, that does not work like that.
::Is there already a name for it?
::
::--
::Andy
:
: You don't save the strlen no matter what. It's a string function.
: If you want to call it 'strndup' then it has to be compatible with
: the linux strndup() and strndup()'s implementations on other platforms.
:
: If it isn't taking the length of the string into account, it isn't a
: string function and it shouldn't be called 'str*'.
:
: In anycase, I wouldn't worry about the strlen(). We are talking
: a few nanoseconds... maybe 10-20ns for most strings, and strndup()
: is doing a malloc() anyway which is MUCH more expensive then strlen().
: Don't try to over-optimize the functionality at the cost of creating
: obfuscated code!

I need to amend this comment, because I implied that strlen() had to
be taken. In fact, it's a bit more complex then that. strndup() is
not allowed to scan the string beyond the specified maximum length
(because the string might not be terminated, as would be the case if
strndup() were used to cut out strings from a memory-mapped file).

:Why not call it memdup instead and drop the termination? String
:functions for standard C, as broken as they are, are all based around
:having a null terminator, and in your case you're actually basing
:entirely off a length (but allocating for length + 1 which is very
:counter-intuitive). Not that this function really achieves anything to
:begin with...

A memdup that is not string-oriented is a fine idea, but it
would not be something we would add to libc unless there were
a pre-existing reasonably standardized function somewhere that
did that sort of operation. It's only a few lines of code but
the problem vis-a-vie putting things into libc is standardization.

:I never cared for C-style strings. To set a length for them you have
:to modify them, and this means you have to re-allocate if doing
:read-only tokenizing or regex extraction. In my own code I define a
:...
: -- Dmitri Nikulin

People are welcome to implement their own string handling functions,
but we aren't going to put things into libc that are not standardized
across multiple platforms. C's string handling functions aren't the
best in the world, but they aren't that bad either. \0 termination
is not a big deal and strlen() is not a big deal either.

Programs which manipulate very long strings often keep track of
the length of the string themselves. For example, the cpdup utility
manipulates potentially very long file paths and it caches index points
into the path strings to avoid having to call strlen() on the whole
string.

On 6/21/06, Matthew Dillon <dillon@apollo.backplane.com> wrote:
> A memdup that is not string-oriented is a fine idea, but it
> would not be something we would add to libc unless there were
> a pre-existing reasonably standardized function somewhere that
> did that sort of operation. It's only a few lines of code but
> the problem vis-a-vie putting things into libc is standardization.
>
> People are welcome to implement their own string handling functions,
> but we aren't going to put things into libc that are not standardized
> across multiple platforms.

In neither case (memdup nor my own kit) was I talking about inclusion
into libc, which I consider completely useless because not everyone
else will do it, so it'll have to be duplicated in 'portable' code
bases anyway. I was merely saying that for Andreas' usage, he can
easily find a better way, and I gave my example of a clean foundation
for efficient and scalable memory referencing, which happens to work
well for byte-per-char strings too.

On Wed, Jun 21, 2006 at 01:20:24AM -0700, Matthew Dillon wrote:
> A memdup that is not string-oriented is a fine idea, but it
> would not be something we would add to libc unless there were
> a pre-existing reasonably standardized function somewhere that
> did that sort of operation. It's only a few lines of code but
> the problem vis-a-vie putting things into libc is standardization.

For memdup there is precedence. The situation is a bit different for
that, since it does fill a hole (copying memory buffer). I don't like
strndup, since the meaning of "copy string up to a fixed length" is
asking for trouble. It doesn't allow to check for truncation without
killing the original intent. Ignoring truncation created enough problems
in the past already, let's not create another API for that.

On Wed, Jun 21, 2006 at 01:13:20AM -0700, Matthew Dillon wrote:
> I need to amend this comment, because I implied that strlen() had to
> be taken. In fact, it's a bit more complex then that. strndup() is
> not allowed to scan the string beyond the specified maximum length
> (because the string might not be terminated, as would be the case if
> strndup() were used to cut out strings from a memory-mapped file).

And the unrolled loop can be even slower than strlen(). memchr would be
better for this purpose, but the issue remains: the interface has
potential for unexpected abuse / side effects.