In DMD 2.006, the definition of string was changed from const(char)[] to
invariant(char)[] (and similarly wstring and dstring). This change has no
doubt broken a fair amount of D 2.x code. While at it, I've found that the
functions in std.string use a mix of char[] and string, but only one uses
const(char)[].
It's worth thinking about what's actually best, from both coding and runtime
efficiency POVs.
Let's first look at string manipulation functions such as those in
std.string. These merely look at a passed-in string and return something -
they don't keep the string for later. They therefore need only a read-only
view of a string - they don't need to know that the string is never going to
change. Declaring these with invariant parameters therefore means that it
is often necessary to .idup a string just to pass it to one of these
functions. Moreover, if a piece of code manipulates strings with a mixture
of direct modification and calls to std.string functions, it necessitates
quite a bit of copying of strings.
Consider, for example, a program that reads in a text file, normalises the
line breaks and then ROT13 encodes the result. Under D 1.x, this works:
----------
import std.file, std.string, std.cstream;
void main(string[] a) {
char[] text = cast(char[]) read(a[1]);
text = text.replace("\r\n", "\n").replace("\r", "\n");
foreach (ref char c; text) {
if ((c >= 'A' && c <= 'M') || (c >= 'a' && c <= 'm')) {
c += 13;
} else if ((c >= 'N' && c <= 'Z') || (c >= 'n' && c <= 'z')) {
c -= 13;
}
}
dout.writeString(text);
}
----------
Note that the text is never copied after it is read in, except by
std.string.replace if it actually makes any change. In D 2.x, it's
necessary to change one line, to something like
text = text.idup.replace("\r\n", "\n").replace("\r", "\n").dup;
which consequently adds two copy operations.
There are a few caveats to this example:
- the .idup is only because std.file.read currently returns a mutable
void[] - we could actually cast it to an invariant as nothing else is going
to use it
- there's no significant reason to normalise the line breaks before, rather
than after, ROT13ing it
- if all we're going to do is output it, we could output on the fly rather
than trying to modify the text in memory
but these won't be true in the more general case. There are probably plenty
of more involved examples in which there's more difference than this between
the 1.x and 2.x code.
A fairly recent discussion
http://tinyurl.com/2kgpqg
touched on the question of whether functions that generate a string and
return it should return mutable or immutable references. And the discussion
was by no means conclusive.
If a public library function returns a mutable array reference, it can mean
either:
(a) it is giving up ownership of the memory the array occupies
(b) it is giving the caller direct access to data it holds for some purpose
(std.mmfile is an example of this)
In case (a), if there's no risk of there being other references to the same
memory (as is the case if the function always allocates it) then it would
make sense to give the caller the choice of whether it should be mutable.
Indeed, the choice is already there, but in no way that protects against
inadvertently trying it on something of case (b).
Unfortunately, constness doesn't play well with copy-on-write. Needless to
reiterate, std.string's functions want at least a read-only view of the
array. But the caller might still want what's returned to be mutable. If
the function is going to return the passed-in string, it can only (sensibly)
return a read-only view since that's what it received. While the caller
could try the D&D trick of casting away the constness, there is a risk of
catastrophic failure if some std.string implementation caches strings for
reuse.
But invariant clearly has some use. It enables the claim that "constant
data need never be copied" to work, as long as invariant is used well. So
if something receives a string from the caller and wants to save it for
later use/retrieval, declaring the parameter as invariant will mean that the
callee won't need to copy the data. So there's the benefit that, by making
it the caller's responsibility to .idup the data if necessary, it'll save
the overhead of unnecessary copying. Similarly, when something later wants
to retrieve the data, if it is invariant then there's no need to copy it. A
library and an application that uses it can share one copy of the data, and
believe that the data will never change.
To round things up, the D 2.x const/final/invariant thing is certainly
useful, but not perfect. The different storage classes/type modifiers are
good for different things. It might be worth a good think about how Phobos
uses them (or in some cases doesn't), and what is best practically for the
definitions of string/wstring/dstring. (Was there a discussion I missed?)
Of course, it would also be worth a good think about whether anything can be
added or changed in the language to improve matters. Ideas that come to my
mind are:
1. A property .invariant, which just returns the reference as an invariant
if it's already invariant, otherwise does the same as .idup. For this to
work, the runtime would have to keep a record of whether each piece of
allocated memory is invariant, which would interfere with the current
ability to cast invariance in or out - but at what cost?
2. Some type modifier such as 'unique' that would indicate that only one
reference to the data exists. I'm not sure what rules there should be to
enforce this, or if we should just go on trust. But it would be implicitly
convertible to mutable, const or invariant, enabling something like
----------
unique(int)[] rep(int i) {
unique(int)[] result;
result.length = i % 10;
result[] = i;
return result;
}
int[] twos = rep(2);
const(int) fives = rep(5);
invariant(int)[] twelves = rep(12);
----------
3. Some concept of const-transparent functions. One approach would be to
enable a type modifier (or the lack thereof) to be used as a template
parameter, with something like
----------
T(char)[] doSomethingWith(typemod T)(T(char)[] param) {
...
}
char[] str;
const(char)[] cstr;
invariant(char)[] istr;
str = doSomethingWith(str);
cstr = doSomethingWith(cstr);
istr = doSomethingWith(istr);
----------
This would enable copy on write to work well. As long as nothing _within_
the function so templated relies on the distinction, the compiler could
optimise by generating only one instance, since it affects only compile-time
type checking and not code generation.
Comments?
Stewart.
--
My e-mail address is valid but not my primary mailbox. Please keep replies
on the 'group where everybody may benefit.

Let's first look at string manipulation functions such as those in
std.string. These merely look at a passed-in string and return
something - they don't keep the string for later. They therefore need
only a read-only view of a string - they don't need to know that the
string is never going to change. Declaring these with invariant
parameters therefore means that it is often necessary to .idup a string
just to pass it to one of these functions. Moreover, if a piece of code
manipulates strings with a mixture of direct modification and calls to
std.string functions, it necessitates quite a bit of copying of strings.

As I've remarked in another thread, it makes absolutely no sense to me
to use invariant for the library string functions, for exactly this
reason. They used to be const, which enables them to work on any kind
of string the user might want to call them on (mutable, const, or
invariant).

1. A property .invariant, which just returns the reference as an
invariant if it's already invariant, otherwise does the same as .idup.
For this to work, the runtime would have to keep a record of whether
each piece of allocated memory is invariant, which would interfere with
the current ability to cast invariance in or out - but at what cost?

Actually, this wouldn't need to have any runtime consequences, as the
invariantness-or-not of a thing is part of its static typing, so could
be determined at compile time. Of course, something can be invariant
even if it's not typed as invariant (and undecidably so), but do we
really need to worry about that? The .invariant property could simply
return the array if the array is typed as invariant, and return .idup
otherwise.

3. Some concept of const-transparent functions. One approach would be
to enable a type modifier (or the lack thereof) to be used as a template
parameter, with something like
----------
T(char)[] doSomethingWith(typemod T)(T(char)[] param) {
...
}

A proposal for doing something very much like this is already planned
for D 2.0 (it's in the WalterAndrei.pdf from the D conference a couple
months back). It's called the 'return' storage class, which would make
the return value of a function take on the same constness or
invariantness as a parameter:
const(char)[] doSomethingWith(return const (char)[] param) {
...
}
What this does is makes the constness of the return value the same as
the constness of the argument passed to 'param', each time the function
is called. You can do something similar with templates already, but you
can't make template functions virtual. The function is type-checked
with the declared types for the parameter and return (in this case,
const(char)[]).
Thanks,
Nathan Reed

As I've remarked in another thread, it makes absolutely no sense to me to
use invariant for the library string functions, for exactly this reason.
They used to be const, which enables them to work on any kind of string
the user might want to call them on (mutable, const, or invariant).

Exactly what I was thinking.

1. A property .invariant, which just returns the reference as an
invariant if it's already invariant, otherwise does the same as .idup.
For this to work, the runtime would have to keep a record of whether each
piece of allocated memory is invariant, which would interfere with the
current ability to cast invariance in or out - but at what cost?

Actually, this wouldn't need to have any runtime consequences, as the
invariantness-or-not of a thing is part of its static typing, so could be
determined at compile time. Of course, something can be invariant even if
it's not typed as invariant (and undecidably so), but do we really need to
worry about that? The .invariant property could simply return the array
if the array is typed as invariant, and return .idup otherwise.

I was thinking about the possibility of .invariant being able to detect
whether a pointer or array reference typed as const refers to data that was
created as invariant or not.

3. Some concept of const-transparent functions. One approach would be to
enable a type modifier (or the lack thereof) to be used as a template
parameter, with something like
----------
T(char)[] doSomethingWith(typemod T)(T(char)[] param) {
...
}

A proposal for doing something very much like this is already planned for
D 2.0 (it's in the WalterAndrei.pdf from the D conference a couple months
back). It's called the 'return' storage class, which would make the
return value of a function take on the same constness or invariantness as
a parameter:
const(char)[] doSomethingWith(return const (char)[] param) {
...
}
What this does is makes the constness of the return value the same as the
constness of the argument passed to 'param', each time the function is
called. You can do something similar with templates already, but you
can't make template functions virtual. The function is type-checked with
the declared types for the parameter and return (in this case,
const(char)[]).

This looks odd to me - you change the code declaring the _parameter_ type in
order to effect a variation in the _return_ type? And how would you use the
type of parameterised constness within the body of the function?
Stewart.
--
My e-mail address is valid but not my primary mailbox. Please keep replies
on the 'group where everybody may benefit.

A proposal for doing something very much like this is already planned for
D 2.0 (it's in the WalterAndrei.pdf from the D conference a couple months
back). It's called the 'return' storage class, which would make the
return value of a function take on the same constness or invariantness as
a parameter:
const(char)[] doSomethingWith(return const (char)[] param) {
...
}
What this does is makes the constness of the return value the same as the
constness of the argument passed to 'param', each time the function is
called. You can do something similar with templates already, but you
can't make template functions virtual. The function is type-checked with
the declared types for the parameter and return (in this case,
const(char)[]).

This looks odd to me - you change the code declaring the _parameter_ type in
order to effect a variation in the _return_ type? And how would you use the
type of parameterised constness within the body of the function?

I agree with Stewart here. (Actually, reading what's been read, I
agree with everybody, except possibly Walter and Andrei).
It makes much more sense to me to allow some kind of template-like
parameter whose value can be const, invariant or mutable.
Also useful would be stuff like
is(a : const)
is(a : invariant)
is(a : mutable)
for compile-time decision-making.

I believe that should be
bool willChange = false;
foreach(char c;text) if (inPattern["A-Za-z"]) { willChange = true; break }
if (willChange)
{
char[] s = text.dup;
foreach (ref char c; s) {
if ((c >= 'A' && c <= 'M') || (c >= 'a' && c <= 'm')) {
c += 13;
} else if ((c >= 'N' && c <= 'Z') || (c >= 'n' && c <= 'z')) {
c -= 13;
}
text = assumeUnique(s);
}
(Before this release, I would have written
text = cast(string)s;
That still compiles without complaint, but assumeUnique() is better).
The test to see if the string will change is good copy-on-write
behavior. The rest is your code, adapted to how you're supposed to do
things in D2.006. First you dup text, because that string /might/ be
in ROM. Then you make your changes. When you've got what you want, you
use assumeUnique() to turn it back into a string. This does /not/ make
a copy.

There's the problem. You've made the code more complicated to make the
final copy conditional on something actually changing. In an ideal world,
it would be unnecessary to make that final copy at all (as far as the way my
example uses it is concerned).
Moreover, your code loops twice, first to see if there's anything to change
and then to perform the conversion. This in itself would take a performance
hit.

(Before this release, I would have written
text = cast(string)s;
That still compiles without complaint, but assumeUnique() is better).
The test to see if the string will change is good copy-on-write
behavior. The rest is your code, adapted to how you're supposed to do
things in D2.006. First you dup text, because that string /might/ be
in ROM. Then you make your changes. When you've got what you want, you
use assumeUnique() to turn it back into a string. This does /not/ make
a copy.

You miss the point. My example is of ad-hoc code to perform the conversion
in place, because it is the most efficient mechanism with the constraints
under which the application will ever perform it. Data always loaded into
RAM immediately before the conversion, and no desire to keep the 'before'
data once the conversion has happened.
<snip>

There are probably plenty
of more involved examples in which there's more difference than this
between
the 1.x and 2.x code.

If every string function you write obeys the copy-(only)-on-write
protocol, then I don't see that.

Well, I wasn't writing a string function there, so that's beside the point.
If you're implementing a complicated string-manipulating algorithm, you're
not necessarily going to separate every little step of the algorithm into a
separate function.
Stewart.
--
My e-mail address is valid but not my primary mailbox. Please keep replies
on the 'group where everybody may benefit.

Note that the text is never copied after it is read in, except by
std.string.replace if it actually makes any change. In D 2.x, it's
necessary to change one line, to something like
text = text.idup.replace("\r\n", "\n").replace("\r", "\n").dup;
which consequently adds two copy operations.

Well, no. Take a look at the source to std.string.replace(). It does not
modify the input in place - it returns the input if there are no
changes, if there are changes, it returns a *copy*. Second, text should
be declared as a string, so you do not need either of the dup's. Two
copies are made, just as with the 1.0 version, in that line of code.
You will need a third copy to do the loop which modifies the string in
place. I feel that, with strings, the advantages of invariant strings
outweigh the disadvantages.
Note that one can still do the modify-in-place D 1.0 code, and do it
very fast, by putting the tests for \r inside the loop rather than as
separate loops. The D 1.0 version isn't what you'd write if you wanted
speed, anyway.

Note that the text is never copied after it is read in, except by
std.string.replace if it actually makes any change. In D 2.x, it's
necessary to change one line, to something like
text = text.idup.replace("\r\n", "\n").replace("\r", "\n").dup;
which consequently adds two copy operations.

Well, no. Take a look at the source to std.string.replace(). It does not
modify the input in place - it returns the input if there are no changes,
if there are changes, it returns a *copy*.

That's basically what I said.

Second, text should be declared as a string, so you do not need either of
the dup's. Two copies are made, just as with the 1.0 version, in that
line of code.
You will need a third copy to do the loop which modifies the string in
place.

In the first of these two paragraphs, you suggest omitting the final .dup,
and then in the next, you effectively tell me to put it back in. Therein
lies my point - the "third copy" ought not to be necessary for my ad hoc
code. (I know I could cast away the invariant, but that's a rather down and
dirty trick.)

I feel that, with strings, the advantages of invariant strings outweigh
the disadvantages.

When it comes to string manipulation functions, giving the programmer the
choice, with my proposal of const-transparency, would AISI bring even more
advantages and alleviate some of the disadvantages.

Note that one can still do the modify-in-place D 1.0 code, and do it very
fast, by putting the tests for \r inside the loop rather than as separate
loops. The D 1.0 version isn't what you'd write if you wanted speed,
anyway.

Yes, that's another way to do it....
Stewart.
--
My e-mail address is valid but not my primary mailbox. Please keep replies
on the 'group where everybody may benefit.

No, because I declared all my strings as string or wstring, not char[]
or wchar[]. And because I use, and assume, D's copy-on-write protocol.

If /all/ strings are invariant, then you're very limited in what
manipulations you can perform.

That's not true. You can do as much manipulation as you want at
creation-time. Only once the manipulation is "finished" do you cast
the result to string. (And that behavior is identical between const()
and invariant(), by the way, so the change to invariant() makes zero
difference to the source code at this point, except that you now have
the option of using the assumeUnique() function).

Only in the cases where ensuring that the reference is unique is possible.

It's always possible. Just write an opening brace, do all your string
creation, assign the string to a string variable declared outside the
scope, then write a closing brace. Viola - invariance guaranteed,
because all the other references used in creation just went out of
scope.
Note that it is perfectly permissible to have multiple references to
an an invariant string anyway - providing that all of those references
are themselves declared invariant. It's only non-invariant references
which are prohibited, which is why they're the ones you have to lose
at the scope boundary.

There's the problem. You've made the code more complicated to make the
final copy conditional on something actually changing.

Of course. That /is/ the copy-on-write protocol. If nothing changes,
return the original.

Moreover, your code loops twice, first to see if there's anything to change
and then to perform the conversion. This in itself would take a performance
hit.

I could have used the new munch() function instead of the first loop,
but I didn't think of it at the time I wrote the example.

You miss the point. My example is of ad-hoc code to perform the conversion
in place, because it is the most efficient mechanism with the constraints
under which the application will ever perform it. Data always loaded into
RAM immediately before the conversion, and no desire to keep the 'before'
data once the conversion has happened.

Well then there's no problem anyway. For the whole time that your
string is "under construction", then it's not a string, it's a
(mutable) array of chars. Just keep it as such, until you've finished
building the string. Then do can do everything in place.
But note that if you want to do in-place manipulation of chars, then
std.string.replace() is NOT the function to use, because that
(possibly) makes a copy. Instead, you would have a replacing loop, or
write your own in-place-replace function which operates on char arrays
(or templatized for arrays in general)

Well, I wasn't writing a string function there.

Then you shouldn't be calling std.string functions. What you need are
array functions.

If /all/ strings are invariant, then you're very limited in what
manipulations you can perform.

That's not true. You can do as much manipulation as you want at
creation-time. Only once the manipulation is "finished" do you cast
the result to string. (And that behavior is identical between const()
and invariant(), by the way, so the change to invariant() makes zero
difference to the source code at this point, except that you now have
the option of using the assumeUnique() function).

So effectively, you're using the word "string" to refer specifically to the
invariant kind, making "if all strings are invariant" a null condition.

Only in the cases where ensuring that the reference is unique is
possible.

It's always possible. Just write an opening brace, do all your string
creation, assign the string to a string variable declared outside the
scope, then write a closing brace. Viola - invariance guaranteed,
because all the other references used in creation just went out of
scope.

Maybe you're right ... but I'll have to see.
<snip>

You miss the point. My example is of ad-hoc code to perform the
conversion
in place, because it is the most efficient mechanism with the constraints
under which the application will ever perform it. Data always loaded
into
RAM immediately before the conversion, and no desire to keep the 'before'
data once the conversion has happened.

Well then there's no problem anyway. For the whole time that your
string is "under construction", then it's not a string, it's a
(mutable) array of chars. Just keep it as such, until you've finished
building the string. Then do can do everything in place.

So in other words, my code was more or less right in the first place.

But note that if you want to do in-place manipulation of chars, then
std.string.replace() is NOT the function to use, because that
(possibly) makes a copy. Instead, you would have a replacing loop, or
write your own in-place-replace function which operates on char arrays
(or templatized for arrays in general)

Having the std.string functions is useful even if they create copies. A
little bit of copying where it makes coding easier is OK for apps that
aren't performance-critical, but it's still nice not to be made to do even
more copying or down-and-dirty casting away invariant.

Well, I wasn't writing a string function there.

Then you shouldn't be calling std.string functions. What you need are
array functions.

I'm not sure what you mean....
Stewart.
--
My e-mail address is valid but not my primary mailbox. Please keep replies
on the 'group where everybody may benefit.

So effectively, you're using the word "string" to refer specifically to the
invariant kind, making "if all strings are invariant" a null condition.

Well, it is now. :-)
I guess I should have clarified that all my strings were immutable
even back when string was const(char)[].

So in other words, my code was more or less right in the first place.

Basically, yes. /Except/ in your expectations of std.string. The
functions in std.string are for fully constructed strings, not for
strings-under-construction. I'll get back to that in a minute.

Then you shouldn't be calling std.string functions. What you need are
array functions.

I'm not sure what you mean....

I suppose I'm saying we need some extra functions. Maybe even a
library std.array. (It's not a big deal as it's relatively easy to
write these things yourself). But for functionality like in-place
replace, we should be using some specialized function like (if it
existed) std.array.replace - but definitely not std.string.replace.
Have you ever programmed in PHP? I'd like to see (almost) all the PHP
array functions available as standard in D. That way, it would be so
much easier to build char arrays in the manner that you suggest, and
then turn them into strings when they're fully built.
See http://uk2.php.net/manual/en/ref.array.php