I don't get it either. Any possible application for const in the form of code
correctness went out the window once the invariant virus forced all strings to
be invariant whether they were or not; so I still need to use dup to guarantee
that data won't change underneath me, but then I need to cast it! It doesn't
affect pure at all, because pure can be passed invariants which are just casted
- the compiler needs to use rules which are a hell of a lot more binding than
anything we can provide it to make these determinations. Now that const is not
a storage class, it's actually not possible to declare function variables which
should be stored in read-only memory (unless if it's encoded somewhere in the
thirty or so combinations you can use), which also damages pure. It's a lot
more confusing to deal with const data altogether than it used to be.
When I switched from D 1.0 to 2.0 I tried several times to port some large
pieces of code over and ultimately gave up, just as everyone has given up
trying to do it in C++. It's a hard task moving code through that kind of
change.
I've learned to handle it but I would really like to not be fighting the
compiler all the time. Is that what I'm supposed to be doing here, really?

I don't get it either. Any possible application for const in the form
of code correctness went out the window once the invariant virus
forced all strings to be invariant whether they were or not; so I
still need to use dup to guarantee that data won't change underneath
me, but then I need to cast it!

My experience is that when dup is used to guarantee that the data won't
change underneath the problem is being addressed at the wrong end. The
mutator of the data dups it, not the user.

It doesn't affect pure at all,
because pure can be passed invariants which are just casted - the
compiler needs to use rules which are a hell of a lot more binding
than anything we can provide it to make these determinations. Now
that const is not a storage class, it's actually not possible to
declare function variables which should be stored in read-only memory
(unless if it's encoded somewhere in the thirty or so combinations
you can use), which also damages pure. It's a lot more confusing to
deal with const data altogether than it used to be.
When I switched from D 1.0 to 2.0 I tried several times to port some
large pieces of code over and ultimately gave up, just as everyone
has given up trying to do it in C++. It's a hard task moving code
through that kind of change.
I've learned to handle it but I would really like to not be fighting
the compiler all the time. Is that what I'm supposed to be doing
here, really?

Can you be more specific about what was stimying you, as perhaps we can
think of a solution.

I don't get it either. Any possible application for const in the form
of code correctness went out the window once the invariant virus
forced all strings to be invariant whether they were or not; so I
still need to use dup to guarantee that data won't change underneath
me, but then I need to cast it!

My experience is that when dup is used to guarantee that the data won't
change underneath the problem is being addressed at the wrong end. The
mutator of the data dups it, not the user.

I don't know the origin of the string or where it's going. I only know it's
"invariant (char) []" now, because you're required to have it "invariant (char)
[]" because of the viral nature of const and that's how the string is defined,
so that's what my API takes. It certainly doesn't mean that anyone is
intentionally using the API incorrectly; everything takes "invariant (char)
[]", so my function doesn't deserve special treatment without stating so.
A more accurate way would be for the string type to be "const (char) []", and
for functions which retain strings that can't change to take "invariant (char)
[]". That makes pretty good claims about the nature of the string, but it would
clearly result in lots of cast management.
I think all these problems boil down to the fact that invariant tells you about
the container rather than the object itself; but whether the object actually is
invariant is totally independent of the container. The genius of D 1.0's const
is that it makes an actual, actionable true statement about the object. That
was a HUGE benefit. This tries to push it back into C territory and I don't
think it works.
I don't buy that this is going to lead to any MP bonuses either. Const has
turned into a form of the sufficiently complex compiler fallacy, the cure for
any ail. But if you can cast const on or off, then any declaration you make
about the mutability of the data is going to be wrong at some point, and
compilers can't sometimes build the right code.

It doesn't affect pure at all,
because pure can be passed invariants which are just casted - the
compiler needs to use rules which are a hell of a lot more binding
than anything we can provide it to make these determinations. Now
that const is not a storage class, it's actually not possible to
declare function variables which should be stored in read-only memory
(unless if it's encoded somewhere in the thirty or so combinations
you can use), which also damages pure. It's a lot more confusing to
deal with const data altogether than it used to be.
When I switched from D 1.0 to 2.0 I tried several times to port some
large pieces of code over and ultimately gave up, just as everyone
has given up trying to do it in C++. It's a hard task moving code
through that kind of change.
I've learned to handle it but I would really like to not be fighting
the compiler all the time. Is that what I'm supposed to be doing
here, really?

Can you be more specific about what was stimying you, as perhaps we can
think of a solution.

Exactly the same thing as trying to do it in C++. You're stuck iteratively
applying const to pretty much everything, and applying tons of casts to get the
compiler to shut up. D does provide the mechanisms to have proper safe
mutableCast/constCast/invariantCast functions (well, safe except that they make
it so that neither compiler nor programmer can make any good statements about
the nature of the data), but when trying to shift code around you usually just
want to make it work.
I remember that one thing that really got to me was that you had a foreach term
as invariant at that point if it weren't defined as inout. The thing is, that
was correct and could've led you to a sweet easy compiler optimisation, but
it's like const is in this balance where properly implemented it would be
brutal and ugly, so we need to back everything off until it's only at an
acceptable level of annoyance.
Oh, and templates. Templates that have nothing to do with any of these matters
keep on having to be changed to appease the const god. Dealing with templates
when you're doing really crazy stuff's bad enough without having that around.

A more accurate way would be for the string type to be "const (char)
[]", and for functions which retain strings that can't change to take
"invariant (char) []". That makes pretty good claims about the nature
of the string, but it would clearly result in lots of cast
management.

I use that all the time, it's a great idiom. What cast management needs
to be done? What I need to do is occasionally insert an .idup on the
client side because the callee wants a copy. So that's that.

I think all these problems boil down to the fact that invariant tells
you about the container rather than the object itself; but whether
the object actually is invariant is totally independent of the
container. The genius of D 1.0's const is that it makes an actual,
actionable true statement about the object. That was a HUGE benefit.
This tries to push it back into C territory and I don't think it
works.

I don't think I understand most of this, possibly because some of it is
wrong. D2's immutable does offer a solid guarantee about what's going on
and offers a programming model that makes it easy to write correct code
without undue aliasing. So C doesn't quite enter into the picture there.
Objects in a container being invariant tell a lot about the container.
That property makes the container shareable without a risk.

I don't buy that this is going to lead to any MP bonuses either.

Wait and see.

Const has turned into a form of the sufficiently complex compiler
fallacy, the cure for any ail. But if you can cast const on or off,
then any declaration you make about the mutability of the data is
going to be wrong at some point, and compilers can't sometimes build
the right code.

In D you will be able to break any design with a cast, unless you use
the not-yet-defined D2 which disallows all risky casts. So the fact that
you can cast const away is hardly changing anything.

It doesn't affect pure at all, because pure can be passed
invariants which are just casted - the compiler needs to use
rules which are a hell of a lot more binding than anything we can
provide it to make these determinations. Now that const is not a
storage class, it's actually not possible to declare function
variables which should be stored in read-only memory (unless if
it's encoded somewhere in the thirty or so combinations you can
use), which also damages pure. It's a lot more confusing to deal
with const data altogether than it used to be.
When I switched from D 1.0 to 2.0 I tried several times to port
some large pieces of code over and ultimately gave up, just as
everyone has given up trying to do it in C++. It's a hard task
moving code through that kind of change.
I've learned to handle it but I would really like to not be
fighting the compiler all the time. Is that what I'm supposed to
be doing here, really?

can think of a solution.

Exactly the same thing as trying to do it in C++. You're stuck
iteratively applying const to pretty much everything, and applying
tons of casts to get the compiler to shut up.

But this is misusing const. It means it didn't belong there in the first
place. Const (and immutable) in D appear less frequently than in C++
because they provide superior guarantees.

Oh, and templates. Templates that have nothing to do with any of
these matters keep on having to be changed to appease the const god.
Dealing with templates when you're doing really crazy stuff's bad
enough without having that around.

Agreed, qualifiers do make templates a tad harder to define. Fortunately
std.traits.Unqual (to be released) can be helpful there.
Andrei

A more accurate way would be for the string type to be "const (char)
[]", and for functions which retain strings that can't change to take
"invariant (char) []". That makes pretty good claims about the nature
of the string, but it would clearly result in lots of cast
management.

I use that all the time, it's a great idiom. What cast management needs
to be done? What I need to do is occasionally insert an .idup on the
client side because the callee wants a copy. So that's that.

So long as the object definition of string is "invariant (char) []", I can't
guarantee anything about the nature of the object because you need to cast to
"invariant (char) []" to be able to interface with any API.
The good side is that when I changed it to be defined as "const (char) []" only
one line of code made a squeak. That gives me solid actionable information. If
an API is declared as istring, then whatever you give it must not ever change.
If an API is declared as string, then whatever happens in there, it won't
change the data. Pretty good!

I think all these problems boil down to the fact that invariant tells
you about the container rather than the object itself; but whether
the object actually is invariant is totally independent of the
container. The genius of D 1.0's const is that it makes an actual,
actionable true statement about the object. That was a HUGE benefit.
This tries to push it back into C territory and I don't think it
works.

I don't think I understand most of this, possibly because some of it is
wrong. D2's immutable does offer a solid guarantee about what's going on
and offers a programming model that makes it easy to write correct code
without undue aliasing. So C doesn't quite enter into the picture there.
Objects in a container being invariant tell a lot about the container.
That property makes the container shareable without a risk.

I don't buy that this is going to lead to any MP bonuses either.

Wait and see.

I don't need to wait and see when twenty years of "hinted optimisation" have
had predictable results. If the programmer can set an incorrect state doing
something which he's forced to do, then any compiler which uses this state as
an actual description about the situation will cause problems because the
compiler's opportunities to apply these optimisations will shift over the
course of the development of the program. Code works, add one line, code
doesn't work. Code works, try it on another machine, code doesn't work.
The only way I can see this working is if the compiler really did have a good
idea about the nature of the state, at which point the programmer's statements
are completely superfluous. That's not coincidentally exactly how we do these
optimisations now: determine whether the state is in such a way that we can do
this safely, and only then actually apply the optimisation.
That can involve automatic MP of a sort - I understand that's the way the
C-based MP systems work, where you tell it that the state of the program is
such and such before letting it go ahead.
Fully automatic unambiguous effective parallelisation requires that everything
the programmer says is true. It's not something that can be stuffed into a
language with pointers and external API calls. If you think you can do that,
fine. But you haven't, and nobody else has, to my knowledge.

A more accurate way would be for the string type to be "const
(char) []", and for functions which retain strings that can't
change to take "invariant (char) []". That makes pretty good
claims about the nature of the string, but it would clearly
result in lots of cast management.

needs to be done? What I need to do is occasionally insert an .idup
on the client side because the callee wants a copy. So that's that.

So long as the object definition of string is "invariant (char) []",
I can't guarantee anything about the nature of the object because you
need to cast to "invariant (char) []" to be able to interface with
any API.
The good side is that when I changed it to be defined as "const
(char) []" only one line of code made a squeak. That gives me solid
actionable information. If an API is declared as istring, then
whatever you give it must not ever change. If an API is declared as
string, then whatever happens in there, it won't change the data.
Pretty good!

I have trouble following what you're saying. If what you're saying is
essentially that in char[] is a better parameter definition than string
for functions that don't need to escape their string argument, then yes,
you are entirely right.
So what I recommend is:
void foo(in char[] s); // foo looks at s, doesn't escape it
void bar(string s); // bar needs to save s
void baz(char[] s); // baz needs to change s' contents

I think all these problems boil down to the fact that invariant
tells you about the container rather than the object itself; but
whether the object actually is invariant is totally independent
of the container. The genius of D 1.0's const is that it makes an
actual, actionable true statement about the object. That was a
HUGE benefit. This tries to push it back into C territory and I
don't think it works.

it is wrong. D2's immutable does offer a solid guarantee about
what's going on and offers a programming model that makes it easy
to write correct code without undue aliasing. So C doesn't quite
enter into the picture there.
Objects in a container being invariant tell a lot about the
container. That property makes the container shareable without a
risk.

I don't buy that this is going to lead to any MP bonuses either.

I don't need to wait and see when twenty years of "hinted
optimisation" have had predictable results. If the programmer can set
an incorrect state doing something which he's forced to do, then any
compiler which uses this state as an actual description about the
situation will cause problems because the compiler's opportunities to
apply these optimisations will shift over the course of the
development of the program. Code works, add one line, code doesn't
work. Code works, try it on another machine, code doesn't work.

You are completely, thoroughly losing me. I can only assume you are
misunderstanding the role of const and immutable in manycore programming
and build from there.

The only way I can see this working is if the compiler really did
have a good idea about the nature of the state, at which point the
programmer's statements are completely superfluous. That's not
coincidentally exactly how we do these optimisations now: determine
whether the state is in such a way that we can do this safely, and
only then actually apply the optimisation.
That can involve automatic MP of a sort - I understand that's the way
the C-based MP systems work, where you tell it that the state of the
program is such and such before letting it go ahead.
Fully automatic unambiguous effective parallelisation requires that
everything the programmer says is true. It's not something that can
be stuffed into a language with pointers and external API calls. If
you think you can do that, fine. But you haven't, and nobody else
has, to my knowledge.

I continue being lost. Words assemble in phrases, phrases assemble in
sentences, sentences parse properly, but I can't understand one thing. I
need to defer a response to others.
Andrei

A more accurate way would be for the string type to be "const (char)
[]", and for functions which retain strings that can't change to take
"invariant (char) []". That makes pretty good claims about the nature
of the string, but it would clearly result in lots of cast
management.

needs to be done? What I need to do is occasionally insert an .idup on
the client side because the callee wants a copy. So that's that.

can't guarantee anything about the nature of the object because you
need to cast to "invariant (char) []" to be able to interface with any
API.
The good side is that when I changed it to be defined as "const (char)
[]" only one line of code made a squeak. That gives me solid actionable
information. If an API is declared as istring, then whatever you give
it must not ever change. If an API is declared as string, then whatever
happens in there, it won't change the data. Pretty good!

essentially that in char[] is a better parameter definition than string
for functions that don't need to escape their string argument, then yes,
you are entirely right.
So what I recommend is:
void foo(in char[] s); // foo looks at s, doesn't escape it
void bar(string s); // bar needs to save s
void baz(char[] s); // baz needs
to change s' contents

I think what Burton is saying is by annointing immutable(char)[] as the
type "string," you are essentially sending a message to developers that
all strings should be immutable, and all *string parameters* should be
declared immutable. What this does is force developers who want to deal
in const or mutable chars have to do lots of duplication or casting,
which either makes your code dog slow, or makes your code break const.
Evidence is how (at least in previous releases) anything in Phobos that
took an argument that was a utf8 string of characters used the parameter
type "string", making it very difficult to use when you don't have string
types. If you want to find a substring in a string, it makes no sense
that you first have to make the argument invariant. a substring function
isn't saving a pointer to that data.
I think the complaint is simply that string is defined as immutable(char)
[] and therefore is promoted as *the only* string type to use. Using
other forms (such as in char[] or const(char)[] or char[]) doesn't look
like the argument is a string, when the word "string" is already taken to
mean something else.
-Steve

I see. Phobos is being changed to accept in char[] instead of string
wherever applicable. As far as what the default "string" ought to be,
immutable(char)[] is the safest of the three so I think it should be that.
Andrei

I think what Burton is saying is by annointing immutable(char)[] as the
type "string," you are essentially sending a message to developers that
all strings should be immutable, and all *string parameters* should be
declared immutable. What this does is force developers who want to deal
in const or mutable chars have to do lots of duplication or casting,
which either makes your code dog slow, or makes your code break const.
Evidence is how (at least in previous releases) anything in Phobos that
took an argument that was a utf8 string of characters used the parameter
type "string", making it very difficult to use when you don't have string
types. If you want to find a substring in a string, it makes no sense
that you first have to make the argument invariant. a substring function
isn't saving a pointer to that data.
I think the complaint is simply that string is defined as immutable(char)
[] and therefore is promoted as *the only* string type to use. Using
other forms (such as in char[] or const(char)[] or char[]) doesn't look
like the argument is a string, when the word "string" is already taken to
mean something else.
-Steve

I see. Phobos is being changed to accept in char[] instead of string
wherever applicable. As far as what the default "string" ought to be,
immutable(char)[] is the safest of the three so I think it should be that.

So far this discussion has no examples. Let me toss one out as a dart board to
see what folks think:
char[] s = get_some_data(); // accesses a large buffer that i don't want to
copy
size_t pos = my_find(s, "blah");
s[pos] = "c";
string s2 = "another string";
size_t pos2 = my_find(s2, "blah");
another_func(s2[pos2 .. pos2+4]);
size_t my_find(string haystack, string needle)
{
// Perform some kind of search that is read-only
}
another_func(string s) {}
---
It seems like string is the obvious parameter type to use for my_find, but it
doesn't work because we're passing mutable data in. Instead we have to use
const(char)[], which is less intuitive and uglier.
Or did I miss the thrust of the argument?

So far this discussion has no examples. Let me toss one out
as a dart board to see what folks think:
char[] s = get_some_data(); // accesses a large buffer that i don't want to
copy
size_t pos = my_find(s, "blah");
s[pos] = "c";
string s2 = "another string";
size_t pos2 = my_find(s2, "blah");
another_func(s2[pos2 .. pos2+4]);
size_t my_find(string haystack, string needle)
{
// Perform some kind of search that is read-only
}
another_func(string s) {}
---
It seems like string is the obvious parameter type to use for
my_find, but it doesn't work because we're passing mutable data
in. Instead we have to use const(char)[], which is less
intuitive and uglier.
Or did I miss the thrust of the argument?

Most of this discussion seems to assume that there is only two types of
data - immutable and mutable, but there is a third type - "potentially
mutable".
invariant(char)[] --> immutable
--> Once set, it cannot be changed by anything.
const(char)[] --> potentially mutable
--> The compiler ensure that the routine that declares
this will not change it, but it can be changed by
other routines.
char[] --> mutable
--> Anything can change this.
The alias "string" only refers to immutable stuff. Can we come up with
aliases for "potentially mutable" and "mutable" too?
I would argue that the parameter signature for my_find() needs to use the
"const(char)[]" type because that means that the coder and compiler says
that this function won't change the input but we don't particularly care if
the input is immutable, or mutable by something else.
--
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

I would argue that the parameter signature for my_find() needs to use the
"const(char)[]" type because that means that the coder and compiler says
that this function won't change the input but we don't particularly care if
the input is immutable, or mutable by something else.

If you want it to be callable by all three variations on character
arrays, then yes.

When we first got into what to do with strings and
const/immutable/mutable, I was definitely in the camp that strings
should be mutable char[], or at worst const(char)[]. The thing is,
Andrei pointed out to me, languages that are considered very good at
dealing with strings (like Perl) use immutable strings. The fascinating
thing about strings in such languages is:
"Nobody notices they are immutable, they just work."
So what is it about immutability that makes strings "just work" in a
natural and intuitive manner? The insight is that it enables strings,
which are reference types, to behave exactly as if they were value types.
After all, it never occurs to anyone to think that the integer 123 could
be a "mutable" integer and perhaps be 133 the next time you look at it.
If you put 123 into a variable, it stays 123. It's immutable. People
intuitively expect strings to behave the same way. Only C programmers
expect that once they assign a string to a variable, that string may
change in place.
C has it backwards by making strings mutable, and it's one of the main
reasons why dealing with strings in C is such a gigantic pain. But as a
longtime C programmer, I was so used to that I didn't notice what a pain
it was until I started using other languages where string manipulation
was a breeze.
The way to do strings in D is to have them be immutable. If you are
building a string by manipulating its parts, start with mutable, when
finished then convert it to immutable and 'publish' it to the rest of
the program. Mutable char[] arrays should only exist as temporaries.
This is exactly the opposite of the way one does it in C, but if you do
it this way, you'll find you never need to defensively dup the string
"just in case" and things just seem to naturally work out.
I tend to agree that if you try to do strings the C way in D2, you'll
probably find it to be frustrating experience.

When we first got into what to do with strings and
const/immutable/mutable, I was definitely in the camp that strings
should be mutable char[], or at worst const(char)[]. The thing is,
Andrei pointed out to me, languages that are considered very good at
dealing with strings (like Perl) use immutable strings. The fascinating
thing about strings in such languages is:
"Nobody notices they are immutable, they just work."
So what is it about immutability that makes strings "just work" in a
natural and intuitive manner? The insight is that it enables strings,
which are reference types, to behave exactly as if they were value types.
After all, it never occurs to anyone to think that the integer 123 could
be a "mutable" integer and perhaps be 133 the next time you look at it.
If you put 123 into a variable, it stays 123. It's immutable. People
intuitively expect strings to behave the same way. Only C programmers
expect that once they assign a string to a variable, that string may
change in place.
C has it backwards by making strings mutable, and it's one of the main
reasons why dealing with strings in C is such a gigantic pain. But as a
longtime C programmer, I was so used to that I didn't notice what a pain
it was until I started using other languages where string manipulation
was a breeze.
The way to do strings in D is to have them be immutable. If you are
building a string by manipulating its parts, start with mutable, when
finished then convert it to immutable and 'publish' it to the rest of
the program. Mutable char[] arrays should only exist as temporaries.
This is exactly the opposite of the way one does it in C, but if you do
it this way, you'll find you never need to defensively dup the string
"just in case" and things just seem to naturally work out.
I tend to agree that if you try to do strings the C way in D2, you'll
probably find it to be frustrating experience.

"Nobody notices they are immutable, they just work."
So what is it about immutability that makes strings "just work" in a
natural and intuitive manner? The insight is that it enables strings,
which are reference types, to behave exactly as if they were value types.
After all, it never occurs to anyone to think that the integer 123 could
be a "mutable" integer and perhaps be 133 the next time you look at it.
If you put 123 into a variable, it stays 123. It's immutable. People
intuitively expect strings to behave the same way. Only C programmers
expect that once they assign a string to a variable, that string may
change in place.
C has it backwards by making strings mutable, and it's one of the main
reasons why dealing with strings in C is such a gigantic pain. But as a
longtime C programmer, I was so used to that I didn't notice what a pain
it was until I started using other languages where string manipulation
was a breeze.

The way to do strings in D is to have them be immutable. If you are
building a string by manipulating its parts, start with mutable, when
finished then convert it to immutable and 'publish' it to the rest of
the program. Mutable char[] arrays should only exist as temporaries.
This is exactly the opposite of the way one does it in C, but if you do
it this way, you'll find you never need to defensively dup the string
"just in case" and things just seem to naturally work out.
I tend to agree that if you try to do strings the C way in D2, you'll
probably find it to be frustrating experience.

That is a really helpful insight. It also means string programming is a bit
different in D2 than in D1.
At some point in time, it might be helpful to add a little introduction
'howto program with strings' to the D documentation. After all, it is a
major feature of D and departure from C and C++.

I tend to agree that if you try to do strings the C way in D2, you'll
probably find it to be frustrating experience.

That is a really helpful insight. It also means string programming is a bit
different in D2 than in D1.

Tell me about it! When I converted Bud to D2, it was a nightmare. It took
many, many hours of edit-compile cycles to get a clean compile. Then
debugging it took ages due to still trying to think in D1 string terms,
which gave me lots of weird and wrong strings during run time.
After a lot of trial and error, I finally groked the D2 string concept and
got on top of the issue. But I wanted to have the same source support D1
and D2, which leads to a whole new set of horrors.
The lessons I've learned from this exercise include ...
(a) Wait until D2 is stablised.
(b) Use a text macro processor if you want one source to support D1 and D2.
(c) Any project that is not a simple application, might be better
re-written with D2 than converted from D1.
(d) D2 strings are a useful idea. However, one still needs const(char)[]
and char[] types, so useful mnemonics for these is a good idea.
--
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

When we first got into what to do with strings and
const/immutable/mutable, I was definitely in the camp that strings
should be mutable char[], or at worst const(char)[]. The thing is,
Andrei pointed out to me, languages that are considered very good at
dealing with strings (like Perl) use immutable strings. The fascinating
thing about strings in such languages is:
"Nobody notices they are immutable, they just work."

That's what we said about strings in 1.0. You modify it, you copy it, or you
tell the user. The gentleman's agreement worked perfectly and that came without
a mess of keywords, without implicit or explicit restrictions on behaviour,
without having to condition templates.
Perl would be more powerful if its strings were mutable, not less, although not
by much due to the interpreter.

Perl would be more powerful if its strings
were mutable, not less, although not by much
due to the interpreter.

I think we have a terminology issue.
We have character arrays (some fixed length, others variable length -
doesn't matter). In D's world view, data can be invariant (nothing gets to
change it), const (other routines can modify it but this routine will not),
or mutable (anything can change it). So in D we have some character arrays
that are invariant (eg. Literals), some are const, and some are mutable. It
is a pity that D's term "string" is being used in discussions as if it is
synonymous with character array - but it is not. It only refers to certain
types of character arrays - the invariant ones. We really need some simple
terms for const and mutable character arrays.
--
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Perl would be more powerful if its strings
were mutable, not less, although not by much
due to the interpreter.

I think we have a terminology issue.
We have character arrays (some fixed length, others variable length -
doesn't matter). In D's world view, data can be invariant (nothing gets to
change it), const (other routines can modify it but this routine will not),
or mutable (anything can change it). So in D we have some character arrays
that are invariant (eg. Literals), some are const, and some are mutable. It
is a pity that D's term "string" is being used in discussions as if it is
synonymous with character array - but it is not. It only refers to certain
types of character arrays - the invariant ones. We really need some simple
terms for const and mutable character arrays.

I don't think char[] is half bad. const(char)[] is a mouthful, but most
of the time those are function parameters, where the handy in char[]
applies.
Andrei

That's what we said about strings in 1.0. You modify it, you copy it,
or you tell the user. The gentleman's agreement worked perfectly and
that came without a mess of keywords, without implicit or explicit
restrictions on behaviour, without having to condition templates.

The one flaw in it was the behavior I consistently saw of "I'm copying
the string just to be sure I own it and nobody else changes it." D was
meant for copy-on-write, which means copy the string *only* if you
change it. No defensive copying. No "just in case" copying. The
gentleman's agreement failed as far as I could tell.
With immutable strings, the gentleman's agreement is enforced.

That's what we said about strings in 1.0. You modify it, you copy it,
or you tell the user. The gentleman's agreement worked perfectly and
that came without a mess of keywords, without implicit or explicit
restrictions on behaviour, without having to condition templates.

The one flaw in it was the behavior I consistently saw of "I'm copying
the string just to be sure I own it and nobody else changes it." D was
meant for copy-on-write, which means copy the string *only* if you
change it. No defensive copying. No "just in case" copying. The
gentleman's agreement failed as far as I could tell.
With immutable strings, the gentleman's agreement is enforced.

That's what we said about strings in 1.0. You modify it, you copy it,
or you tell the user. The gentleman's agreement worked perfectly and
that came without a mess of keywords, without implicit or explicit
restrictions on behaviour, without having to condition templates.

The one flaw in it was the behavior I consistently saw of "I'm copying
the string just to be sure I own it and nobody else changes it." D was
meant for copy-on-write, which means copy the string *only* if you
change it. No defensive copying. No "just in case" copying. The
gentleman's agreement failed as far as I could tell.
With immutable strings, the gentleman's agreement is enforced.

What about automatic, built-in copy on write?

Then it would happen even when you *know* you're the only one with a
reference. Worse, it'd happen multiple times if you modify multiple
characters in a row...

That's what we said about strings in 1.0. You modify it, you copy it,
or you tell the user. The gentleman's agreement worked perfectly and
that came without a mess of keywords, without implicit or explicit
restrictions on behaviour, without having to condition templates.

The one flaw in it was the behavior I consistently saw of "I'm copying
the string just to be sure I own it and nobody else changes it." D was
meant for copy-on-write, which means copy the string *only* if you
change it. No defensive copying. No "just in case" copying. The
gentleman's agreement failed as far as I could tell.
With immutable strings, the gentleman's agreement is enforced.

What about automatic, built-in copy on write?

No go with threads. COW sounded like a great idea for std::string in
ancient times when threads were a rarity. Today, virtually all C++
implementations actively dropped COW and replaced it with eager copy +
small string optimization for short strings. D really has the best of
all worlds solution.
Andrei

That's what we said about strings in 1.0. You modify it, you copy it,
or you tell the user. The gentleman's agreement worked perfectly and
that came without a mess of keywords, without implicit or explicit
restrictions on behaviour, without having to condition templates.

The one flaw in it was the behavior I consistently saw of "I'm copying
the string just to be sure I own it and nobody else changes it." D was
meant for copy-on-write, which means copy the string *only* if you
change it. No defensive copying. No "just in case" copying. The
gentleman's agreement failed as far as I could tell.
With immutable strings, the gentleman's agreement is enforced.

Am I going to become a broken record on this? Because "invariant (char) []" is
the string type, data that is going to be mutable will always find its way into
that type in order to deal with an API which WILL use string as its arguments,
not writing out "const (char) []". It gives me no information about the future
of the object while removing the apparent need for the gentleman's agreement.
Therefore I have no way of knowing what the actual pedigree of this string I've
been given has. It may be invariant, it may be mutable.
I want this to be addressed directly. Exactly how am I wrong on this point? Is
it not conceivable that mutable data gets casted to invariant in this case?

Am I going to become a broken record on this? Because
"invariant (char) []" is the string type, data that
is going to be mutable will always find its way into
that type in order to deal with an API which WILL use
string as its arguments, not writing out
"const (char) []".

I'm starting to think that 'string' for function parameters should be a
rare thing. For a function to insist that it only recieves immutable data
sounds like the function is worried that it might accidently change data.
And that sounds like a bug to me. It is shifting the responsibility to the
caller for the data's integrity.

It gives me no information about
the future of the object while removing the apparent
need for the gentleman's agreement. Therefore I have
no way of knowing what the actual pedigree of this
string I've been given has. It may be invariant, it
may be mutable.

But why would your function care about that? Let's assume your function's
signature is 'const' for its parameters because it does not intend to
modify any of them. If the caller passes invariant data then your function
cannot modify the arguments. If the caller passes mutable data, the
compiler won't allow your function to modify the parameters either, due to
the const signature. So why is it important that the function should know
the mutability of the passed data?
--
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

If I may restate your case, it is that given function that does
something with character arrays:
int foo(string s);
and you wish to pass a mutable character array to it. If foo was
declared as:
int foo(const(char)[] s);
then it would just work. So why is it declared immutable(char)[] when
that isn't actually necessary?
The answer is to encourage the use of immutable strings. I believe the
future of programming will tend towards ever more use of immutable data,
as immutable data:
1. is implicitly sharable between threads
2. is more conducive to static analysis of programs
3. makes it easier for programmers to understand code
4. enables better code generation
5. allows taking a private reference to without needing to make a copy
const(char)[], on the other hand, still leaves us with the temptation to
make a copy "just in case". If I, as a user, sees:
int foo(const(char)[] s)
what if foo() keeps a private reference to s (which it might if it does
lazy evaluation)? Now I, as a caller, mutate s[] and muck up foo. So, to
fix it, I do:
foo(s.dup); // defensive copy in case foo keeps a reference to s
But the implementor of foo() doesn't know it's getting its own private
copy, so the first line of foo() is:
int foo(const(char)[] s)
{
s = s.dup; // make sure we own a copy
}
so the defensive, robust code has TWO unnecessary copies.

int foo(const(char)[] s)
what if foo() keeps a private reference to s (which it might if it does
lazy evaluation)? Now I, as a caller, mutate s[] and muck up foo. So, to
fix it, I do:
foo(s.dup); // defensive copy in case foo keeps a reference to s

In foo's defence, if it takes a private reference, then it should also take
a copy. In fact, should it be allowed to take a private reference of data
which might be modified after it returns?
--
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

int foo(const(char)[] s)
what if foo() keeps a private reference to s (which it might if it does
lazy evaluation)? Now I, as a caller, mutate s[] and muck up foo. So, to
fix it, I do:
foo(s.dup); // defensive copy in case foo keeps a reference to s

In foo's defence, if it takes a private reference, then it should also take
a copy.

Yup, and as I said, an extra copy "just in case".

In fact, should it be allowed to take a private reference of data
which might be modified after it returns?

Instead of adding more complexity to const so it acts more like
immutable, why not just use immutable <g> ?

If I may restate your case, it is that given function that does
something with character arrays:
int foo(string s);
and you wish to pass a mutable character array to it. If foo was
declared as:
int foo(const(char)[] s);
then it would just work. So why is it declared immutable(char)[] when
that isn't actually necessary?
The answer is to encourage the use of immutable strings. I believe the
future of programming will tend towards ever more use of immutable data,
as immutable data:
1. is implicitly sharable between threads

In fact const data is also implicitly sharable between threads. This is
because shared is not implicitly convertible to const. No?
Andrei

If I may restate your case, it is that given function that does
something with character arrays:
int foo(string s);
and you wish to pass a mutable character array to it. If foo was
declared as:
int foo(const(char)[] s);
then it would just work. So why is it declared immutable(char)[] when
that isn't actually necessary?
The answer is to encourage the use of immutable strings. I believe the
future of programming will tend towards ever more use of immutable
data, as immutable data:
1. is implicitly sharable between threads

In fact const data is also implicitly sharable between threads.

No. You have to declare it "shared const" to make it sharable between
threads.

If I may restate your case, it is that given function that does
something with character arrays:
int foo(string s);
and you wish to pass a mutable character array to it. If foo was
declared as:
int foo(const(char)[] s);
then it would just work. So why is it declared immutable(char)[] when
that isn't actually necessary?
The answer is to encourage the use of immutable strings. I believe
the future of programming will tend towards ever more use of
immutable data, as immutable data:
1. is implicitly sharable between threads

In fact const data is also implicitly sharable between threads.

No. You have to declare it "shared const" to make it sharable between
threads.

Sorry, I got confused. What I meant was that a function accepting a
const T can count on other threads leaving T alone, which is the
converse of what you say. Cool!
Andrei

If I may restate your case, it is that given function that does
something with character arrays:
int foo(string s);
and you wish to pass a mutable character array to it. If foo was
declared as:
int foo(const(char)[] s);
then it would just work. So why is it declared immutable(char)[] when
that isn't actually necessary?

No, that's not the problem at all. The problem is this line in object.d:
alias invariant (char) [] string;
There are two interesting features here. It's what D calls a string, and it's
invariant, making the declaration that whoever has a reference to that string
can hold onto it forever without ever expecting its contents to be modified or
destroyed.
So, while building my string I use a function which replaces matching
substrings in a string with another string. If that function were to declare my
parameters as strings they'd do two things: they'd tell the reader that its
parameters can never change over the course of the program because it may
retain copies of the parameters. That is a strong, highly prescriptive
statement. So I would expect the function to be implemented like this:
const (char) [] replace (const (char) [] s, const (char) [] from, const
(char) [] to)
But it's not. std.string.replace is implemented like this:
string replace (string s, string from, string to)
This is for a number of reasons. It's easiest to assume that the default is
going to be the correct one. The const syntax is hard to read, so it's avoided,
and "string" is more readily descriptive than "const (char) []". So, I pass my
mutable string to std.string.replace, which only accepts invariant data.
This wouldn't be too bad because const is worthless when optimising, but if
invariant is going to be given any weight then we must never cause data to be
casted to invariant unless if it's actually invariant data. So, the sensible
default is "const (char) []" for strings, a selection of aliases in object.d
for the others, and safe casting templates in object.d.

This wouldn't be too bad because const is worthless when optimising,
but if invariant is going to be given any weight then we must never
cause data to be casted to invariant unless if it's actually
invariant data. So, the sensible default is "const (char) []" for
strings, a selection of aliases in object.d for the others, and safe
casting templates in object.d.

What I interpret from this is that you see strings as fundamentally
mutable character arrays, and sometimes in special cases you can make
them immutable. I propose turning that view on its head - regard strings
as fundamentally immutable, and having a mutable char array is a rare
thing that only appears in isolated places in the program.
In the find() example, the implementation of it actually uses a mutable
char[] to build the result. When the result is done, it is converted to
immutable and "published" by returning it. The mutable array never
escapes the function; it is completely sandboxed in.
What sold me on immutable strings was going through my code and looking
to see where I *actually* was mutating the strings in place rather than
just passing them around or storing them or copying them into another
buffer. It turns out it was a vanishingly small number. I was startled.
Not only that, those places could be, with a minor bit of refactoring,
further reduced in number without sacrifice. I stacked this against the
gain by eliminating all those places that were doing copies, and it was
clear that immutable strings as default was a winner.

This wouldn't be too bad because const is worthless when optimising,
but if invariant is going to be given any weight then we must never
cause data to be casted to invariant unless if it's actually
invariant data. So, the sensible default is "const (char) []" for
strings, a selection of aliases in object.d for the others, and safe
casting templates in object.d.

What I interpret from this is that you see strings as fundamentally
mutable character arrays, and sometimes in special cases you can make
them immutable. I propose turning that view on its head - regard strings
as fundamentally immutable, and having a mutable char array is a rare
thing that only appears in isolated places in the program.

No, I don't. You are misunderstanding me, and I'm not sure why or how. Here's a
(contrived) example of where my concern may come into play:
int [] a = new int [1];
a [0] = 1;
auto b = cast (invariant (int) []) a;
a [0] += b [0];
a [0] += b [0];
writef ("%s\n", a [0]);
// Normal result: 4.
// Optimiser which assumes invariant data can't change: 3
Yes, the code is an abuse of the const system. THAT'S EXACTLY MY POINT. Casting
mutable data to invariant leads to situations like these. Only data which will
never change can be made invariant. Putting "alias invariant (char) [] string"
in object.d induces these situations and makes it seem like it's a good idea.

This wouldn't be too bad because const is worthless when
optimising, but if invariant is going to be given any weight then
we must never cause data to be casted to invariant unless if it's
actually invariant data. So, the sensible default is "const
(char) []" for strings, a selection of aliases in object.d for
the others, and safe casting templates in object.d.

mutable character arrays, and sometimes in special cases you can
make them immutable. I propose turning that view on its head -
regard strings as fundamentally immutable, and having a mutable
char array is a rare thing that only appears in isolated places in
the program.

No, I don't. You are misunderstanding me, and I'm not sure why or
how.

I guess I just cannot figure out where you're coming from.

Here's a (contrived) example of where my concern may come into
play:
int [] a = new int [1];
a [0] = 1;
auto b = cast (invariant (int) []) a;
a [0] += b [0]; a [0] += b [0]; writef ("%s\n", a [0]); // Normal
result: 4. // Optimiser which assumes invariant data can't change: 3
Yes, the code is an abuse of the const system. THAT'S EXACTLY MY
POINT. Casting mutable data to invariant leads to situations like
these. Only data which will never change can be made invariant.
Putting "alias invariant (char) [] string" in object.d induces these
situations and makes it seem like it's a good idea.

I'm still not understanding you, because this is a contrived example
that I cannot see the point of nor can I see where it would be
legitimately used.

I'm still not understanding you, because this is a contrived example
that I cannot see the point of nor can I see where it would be
legitimately used.

I can see Burton's concern, and I'm very surprised that the compiler allows
this to happen. Here is a slightly more explicit version of Burton's code.
import std.stdio;
void main()
{
int [] a = new int [1];
a [0] = 1;
invariant (int) [] b = cast (invariant (int) []) a;
writef ("a=%s b=%s\n", a [0], b[0]);
a [0] += b [0];
writef ("a=%s b=%s\n", a [0], b[0]);
a [0] += b [0];
writef ("a=%s b=%s\n", a [0], b[0]);
}
The problem is that we have declared 'b' as invariant, but the program is
allowed to change it. That is the issue.
--
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

The following also compiles:
char c;
int* p = cast(int*)&c;
*p = 5;
and is clearly buggy code. Whenever you use a cast, the onus is on the
programmer to know what they are doing. The cast is an escape from the
typing system.

The following also compiles:
char c;
int* p = cast(int*)&c;
*p = 5;
and is clearly buggy code. Whenever you use a cast, the onus is on the
programmer to know what they are doing. The cast is an escape from the
typing system.

Walter, you have side-stepped the problem in question by talking about a
totally different problem.
Burtons code says "b is invariant", but the program allows it to be
changed. Your code does NOT say that any of those variables is invariant.
The problem is NOT with the cast (although that is a totally different
issue). The problem is that the code says "invariant" but the data gets
changed anyhow. The method of changing the data is not the issue. The issue
is that is gets changed at all.
--
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Walter, you have side-stepped the problem in question by talking about a
totally different problem.

It's the same issue. When you use a cast, you are subverting the type
system. That means you have to be sure you are doing it right. The
compiler cannot help you.

Burtons code says "b is invariant", but the program allows it to be
changed. Your code does NOT say that any of those variables is invariant.
The problem is NOT with the cast (although that is a totally different
issue). The problem is that the code says "invariant" but the data gets
changed anyhow. The method of changing the data is not the issue. The issue
is that is gets changed at all.

When you cast something to immutable, you can no longer change it. It's
a one-way ticket.

Walter, you have side-stepped the problem in question by talking about a
totally different problem.

It's the same issue. When you use a cast, you are subverting the type
system. That means you have to be sure you are doing it right. The
compiler cannot help you.

I disagree. There are two issues being discussed. One by Burton and another
by yourself. Burton is showing how that data declared as immutable can be
modified. You are talking about the dangers of using casts. NOT THE SAME
ISSUE.

Burtons code says "b is invariant", but the program allows it to be
changed. Your code does NOT say that any of those variables is invariant.
The problem is NOT with the cast (although that is a totally different
issue). The problem is that the code says "invariant" but the data gets
changed anyhow. The method of changing the data is not the issue. The issue
is adxthat is gets changed at all.

When you cast something to immutable, you can no longer change it. It's
a one-way ticket.

Walter, did you actually see and run that code? I did cast something to
immutable but it got changed anyway. Is this a compiler bug or what? You
say that if one casts something to immutable that therefore one can no
longer change it - but it DID get changed.
I know that a better way to code this example would have been to use the
.idup functionality, but that is not the point. I relied on the compiler
ensuring that everything declared as immutable would not be modified. The
compiler failed.
--
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Walter, you have side-stepped the problem in question by talking about a
totally different problem.

system. That means you have to be sure you are doing it right. The
compiler cannot help you.

I disagree. There are two issues being discussed. One by Burton and another
by yourself. Burton is showing how that data declared as immutable can be
modified. You are talking about the dangers of using casts. NOT THE SAME
ISSUE.

I know that a better way to code this example would have been to use the
.idup functionality, but that is not the point. I relied on the compiler
ensuring that everything declared as immutable would not be modified. The
compiler failed.

It is the same issue. When you use a cast, you are *explicitly*
defeating the language's type checking ability. It means that the onus
is on the one doing the cast to get it right.

I know that a better way to code this example would have been to use
the .idup functionality, but that is not the point. I relied on the
compiler ensuring that everything declared as immutable would not be
modified. The compiler failed.

defeating the language's type checking ability. It means that the onus
is on the one doing the cast to get it right.

Except when you want invariant data, then cast is *required*.

Not at all. Calling idup (or, more elegantly, to!TargetType) or
assumeUnique under the circumstances documented by assumeUnique are all
safe. Sometimes the price for added safety is one extra copy. But the
added safety is worth it more often than not.
As far as signatures of functions in std.string, I agree that those not
needing a string of immutable characters should just accept in Char[]
(where Char is one of the three character types). That should make
people using mutable and immutable strings equally joyous.

At that
point, it is a language feature, not a defeat of the typesystem. I think
there is some merit to the arguments presented in this thread, but I
don't think the answer is to get rid of invariant. Perhaps make the
compiler more strict when creating invariant data? I liked the ideas
that people presented about having unique mutable references (before and
in this thread). This might even be solvable in a library given all the
advances in structs.

Unique has been discussed extensively a couple of years ago when we were
defining const and immutable. We decided to forgo it and go with
assumeUnique.

So it's not exactly the same issue, because in one you are doing
something totally useless and stupid. And in the other, it is a language
*requirement* to use casting to get invariant data. However, in both
cases, the onus is on the developer, which sucks in the latter case...

NO. It is not a language requirement. If what you have is mutable data
and someone else wants immutable data, make a copy.

Walter: Use invariant when you can, it's the best!
User: ok, how do I use it?
Walter: You need to cast mutable data to invariant, but it's on you to
make sure nobody changes the original mutable data. Casting circumvents
the typesystem, so the compiler can't help you.
User: :(

As far as signatures of functions in std.string, I agree that those not
needing a string of immutable characters should just accept in Char[]
(where Char is one of the three character types). That should make
people using mutable and immutable strings equally joyous.

So you agree that *the standard library* should avoid using immutable
when it's not strictly necessary after all. This is quite in contrary
to what Walter says. Now the question is, how any *other* libraries
differ? I usually don't care about how you use immutable in your
personal code. But if there is an XML library which takes immutables
everywhere, it seriously limits my freedom in chosing my coding style.
Immutable string manipulation produces tons of garbage objects. Every
time you want to replace '\\' with '/' you get garbage, if not as many
times as you encounter '\\' in the string. This is something we avoided
at any cost in mobile Java games. This is the attitude which makes
"large" Java applications so memory-hungry. I have a hard time to
believe this magically became OK in D.

As far as signatures of functions in std.string, I agree that those not
needing a string of immutable characters should just accept in Char[]
(where Char is one of the three character types). That should make
people using mutable and immutable strings equally joyous.

So you agree that *the standard library* should avoid using immutable
when it's not strictly necessary after all.

I do.

This is quite in contrary
to what Walter says.

It is. This may be because of a slight difference in philosophy; I think
any entity (function, type) of the standard library should accept the
most general types it could conceivably work with. In fact I'd be happy
put all or most significant algorithms in std.string inside std.algorithm.
Now the question is, how any *other* libraries

differ? I usually don't care about how you use immutable in your
personal code. But if there is an XML library which takes immutables
everywhere, it seriously limits my freedom in chosing my coding style.

I agree but disagree with "seriously".

Immutable string manipulation produces tons of garbage objects.

This I disagree with. There may be extra copies, but you also save other
copies. Also there's never risky aliasing - coding with immutable
strings is safe. D simply can't afford to seriously suggest in this day
and age a programming style with aliased mutable strings.

Every
time you want to replace '\\' with '/' you get garbage, if not as many
times as you encounter '\\' in the string. This is something we avoided
at any cost in mobile Java games. This is the attitude which makes
"large" Java applications so memory-hungry. I have a hard time to
believe this magically became OK in D.

It does become OK (albeit not magically) in D because D does offer you
the option to work with char[] if you need to. I understand that the
problem of you calling into libraries only taking string still remains,
but really you can't have everything. I think we're in better shape than
most languages.
Andrei

As far as signatures of functions in std.string, I agree that those not
needing a string of immutable characters should just accept in Char[]
(where Char is one of the three character types). That should make
people using mutable and immutable strings equally joyous.

Also std.conv (especially the scheduled-to-be-deprecated toInt(), etc),
and probably also the functions in std.file, such as exists().

As far as signatures of functions in std.string, I agree that those
not needing a string of immutable characters should just accept in
Char[] (where Char is one of the three character types). That should
make people using mutable and immutable strings equally joyous.

Also std.conv (especially the scheduled-to-be-deprecated toInt(), etc),
and probably also the functions in std.file, such as exists().

As far as signatures of functions in std.string, I agree that those
not needing a string of immutable characters should just accept in
Char[] (where Char is one of the three character types). That should
make people using mutable and immutable strings equally joyous.

Also std.conv (especially the scheduled-to-be-deprecated toInt(), etc),
and probably also the functions in std.file, such as exists().

BTW, in my own code, a major reason why string -> const(char)[] is such
a necessary change is that std.stream.readLine() returns 'char[]', not
'string'. So you can hardly do anything without a cast or an idup.
Don.

Walter: Use invariant when you can, it's the best!
User: ok, how do I use it?
Walter: You need to cast mutable data to invariant, but it's on you to
make sure nobody changes the original mutable data. Casting circumvents
the typesystem, so the compiler can't help you.
User: :(

Unfortunately, we could not come up with a typesafe scheme for going
from mutable to immutable that was reasonable. The transition is up to
the user to do correctly, but it isn't a terrible burden, and is simple
to get right.

Walter: Use invariant when you can, it's the best!
User: ok, how do I use it?
Walter: You need to cast mutable data to invariant, but it's on you to
make sure nobody changes the original mutable data. Casting
circumvents the typesystem, so the compiler can't help you.
User: :(

Unfortunately, we could not come up with a typesafe scheme for going
from mutable to immutable that was reasonable. The transition is up to
the user to do correctly, but it isn't a terrible burden, and is simple
to get right.

In fact, since we last discussed that, dmd technology has made enough
strides (particularly wrt manipulation of copies) that we can define
Unique!T and something a la Java's StringBuilder with ease.
Unfortunately current bugs in constructor implementation delay
definition of such artifacts.
Andrei

Walter: Use invariant when you can, it's the best! User: ok, how do I
use it?
Walter: You need to cast mutable data to invariant, but it's on you to
make sure nobody changes the original mutable data. Casting
circumvents the typesystem, so the compiler can't help you. User: :(

from mutable to immutable that was reasonable. The transition is up to
the user to do correctly, but it isn't a terrible burden, and is simple
to get right.

strides (particularly wrt manipulation of copies) that we can define
Unique!T and something a la Java's StringBuilder with ease.
Unfortunately current bugs in constructor implementation delay
definition of such artifacts.

The following also compiles:
char c;
int* p = cast(int*)&c;
*p = 5;
and is clearly buggy code. Whenever you use a cast, the onus is on the
programmer to know what they are doing. The cast is an escape from the
typing system.

I'm still not understanding you, because this is a contrived example
that I cannot see the point of nor can I see where it would be
legitimately used.

I can see Burton's concern, and I'm very surprised that the compiler allows
this to happen. Here is a slightly more explicit version of Burton's code.
import std.stdio;
void main()
{
int [] a = new int [1];
a [0] = 1;
invariant (int) [] b = cast (invariant (int) []) a;
writef ("a=%s b=%s\n", a [0], b[0]);
a [0] += b [0];
writef ("a=%s b=%s\n", a [0], b[0]);
a [0] += b [0];
writef ("a=%s b=%s\n", a [0], b[0]);
}
The problem is that we have declared 'b' as invariant, but the program is
allowed to change it. That is the issue.

As I see it, the cast should fail saying "can't cast a to invariant
because a is mutable", because otherwise you are producing a behaviour
that doesn't match what the code says.
In:
char c;
int* p = cast(int*)&c;
*p = 5;
there's nothing wrong about the cast because a char can be viewed as an
int. But a mutable data cannot be seen as immutable.
Walter, you say "Whenever you use a cast, the onus is on the programmer
to know what they are doing." That's true. But if the programmer made a
mistake and she wasn't knowing what she was doing, the compiler or the
runtime should yell.
For example:
class A { }
class B : A { }
class C : A { }
void foo(A a) {
C c = cast(C) a;
}
The programmer is sure "a" is really of type "C". If it is, ok,
everything works. Now, assuming it is not, this yields a null "c" which
fails at runtime if "c" is used (which will be used eventually because
that's why we are casting it). The failure is a big one: the program
halts. (well, I'd prefer an exception to be thrown, but that's another
issue). Now, if you allow a cast from mutable to immutable to continue,
the error at runtime will be very subtle and really hard to find,
because the program will continue to work but with a wrong behaviour.

As I see it, the cast should fail saying "can't cast a to invariant
because a is mutable", because otherwise you are producing a behaviour
that doesn't match what the code says.
In:
char c;
int* p = cast(int*)&c;
*p = 5;
there's nothing wrong about the cast because a char can be viewed as an
int. But a mutable data cannot be seen as immutable.

It's not interpreting char as int. It's interpreting c and part of
return address as an int on x86 essentially corrupting stack. Or gets
an unaligned access hardware exception. Or whatever. It's anything but
a correct code. It's exactly the same issue as casting mutable to
immutable: you're abusing the power the compiler gave you.
Though I agree that the line between safe and unsafe casts is rather
subtle.

This wouldn't be too bad because const is worthless when
optimising, but if invariant is going to be given any weight then
we must never cause data to be casted to invariant unless if it's
actually invariant data. So, the sensible default is "const
(char) []" for strings, a selection of aliases in object.d for
the others, and safe casting templates in object.d.

mutable character arrays, and sometimes in special cases you can
make them immutable. I propose turning that view on its head -
regard strings as fundamentally immutable, and having a mutable
char array is a rare thing that only appears in isolated places in
the program.

No, I don't. You are misunderstanding me, and I'm not sure why or
how.

I guess I just cannot figure out where you're coming from.

Here's a (contrived) example of where my concern may come into
play:
int [] a = new int [1];
a [0] = 1;
auto b = cast (invariant (int) []) a;
a [0] += b [0]; a [0] += b [0]; writef ("%s\n", a [0]); // Normal
result: 4. // Optimiser which assumes invariant data can't change: 3
Yes, the code is an abuse of the const system. THAT'S EXACTLY MY
POINT. Casting mutable data to invariant leads to situations like
these. Only data which will never change can be made invariant.
Putting "alias invariant (char) [] string" in object.d induces these
situations and makes it seem like it's a good idea.

I'm still not understanding you, because this is a contrived example
that I cannot see the point of nor can I see where it would be
legitimately used.

Obviously I made it contrived so that it's as clear as possible what the issue
is. In reality, it will be going through more layers. Here's one layer:
int [] a = new int [1];
a [0] = 1;
invariant (int) [] func (invariant (int) [] a) { return a; }
auto b = func (cast (invariant (int) []) a);
Notice this has the same pattern as std.string.replace; that's why I did that
cast.
a [0] += b [0];
a [0] += b [0];
writef ("%s\n", a [0]);
// Not optimised: 4.
// Assuming b cannot be modified: 3.
When this actually crops up in bugs the reality will be far more complex and
practically impossible to discover.
I think I've stated this warning a half-dozen times in the last three days, and
that's it, I'm done.

Obviously I made it contrived so that it's as clear as possible what
the issue is. In reality, it will be going through more layers.
Here's one layer:
int [] a = new int [1]; a [0] = 1;
invariant (int) [] func (invariant (int) [] a) { return a; }
auto b = func (cast (invariant (int) []) a);
Notice this has the same pattern as std.string.replace; that's why I
did that cast.
a [0] += b [0]; a [0] += b [0]; writef ("%s\n", a [0]); // Not
optimised: 4. // Assuming b cannot be modified: 3.
When this actually crops up in bugs the reality will be far more
complex and practically impossible to discover.

The solution is, when you cast things to invariant, that's a one way
ticket. You cannot continue to mutate the original. If you find you need
to, the data must be duplicated, and the duplicate made invariant. If
this is done a lot, it's time to refactor the program.
Casts are greppable, and should be reviewed anyway. In this case,
casting an array to immutable should be followed by NO further
references to the mutable version.

You're going into undefined territory and complain that it doesn't work
as you expect. Perhaps that should issue a warning, but you're doing
something wrong and bad: you're saying that the same array is both
mutable and immutable.
Think of the other approach: once you cast an array to invariant, the
compiler finds all aliases of that array and turns them invariant. You'd
be even more upset in that case. It would have long-reaching effects
that are hard to track down.
Or you could forbid the cast. But it's a useful cast, and you really
can't get rid of it if you ever want to convert something that is
mutable to something that is invariant without copying.

You're going into undefined territory and complain that it doesn't work
as you expect. Perhaps that should issue a warning, but you're doing
something wrong and bad: you're saying that the same array is both
mutable and immutable.
Think of the other approach: once you cast an array to invariant, the
compiler finds all aliases of that array and turns them invariant. You'd
be even more upset in that case. It would have long-reaching effects
that are hard to track down.
Or you could forbid the cast. But it's a useful cast, and you really
can't get rid of it if you ever want to convert something that is
mutable to something that is invariant without copying.

A cast could be avoided if the compiler could track unique mutable references.
Then assignment to invariant could be done implicitly and make the mutable
reference no longer exist. This would require escape analysis for mutable
references. I like allowing implicit invariant casts, but I seem to be in the
minority. Doing that brings further language complexity.

That's what we said about strings in 1.0. You modify it, you copy
it, or you tell the user. The gentleman's agreement worked
perfectly and that came without a mess of keywords, without
implicit or explicit restrictions on behaviour, without having to
condition templates.

copying the string just to be sure I own it and nobody else changes
it." D was meant for copy-on-write, which means copy the string
*only* if you change it. No defensive copying. No "just in case"
copying. The gentleman's agreement failed as far as I could tell.
With immutable strings, the gentleman's agreement is enforced.

Am I going to become a broken record on this? Because "invariant
(char) []" is the string type, data that is going to be mutable will
always find its way into that type in order to deal with an API which
WILL use string as its arguments, not writing out "const (char) []".
It gives me no information about the future of the object while
removing the apparent need for the gentleman's agreement. Therefore I
have no way of knowing what the actual pedigree of this string I've
been given has. It may be invariant, it may be mutable.
I want this to be addressed directly. Exactly how am I wrong on this
point? Is it not conceivable that mutable data gets casted to
invariant in this case?

It is conceivable by means of a cast. I've explained that casts can
break any of D's guarantees, so there is nothing new that you can
masquerade a mutable string into an immutable one. If there was a means
to implicitly convert a mutable string into an immutable one, you'd have
a case. But it either looks like you're not understanding something, or
are using a double standard when it comes about casting as applied to
immutability in particular.
There is one point where we are forced to doing something gauche:
assumeUnique. We could have avoided that by introducing a "unique"
notion, but we thought we'd simplify the language by not doing so. So
far the uses of assumeUnique seem to be idiomatic and contained enough
to not be a threat, so it seems to have been a passable engineering
decision.
To recap, if an API takes a string and all you have a char[], DO NOT
CAST IT. Call .idup - better safe than sorry. The API may evolve and
store a reference for later. Case in point: the up-and-coming
std.stdio.File constructor initially was:
this(in char[] filename);
Later on I decided to save the filename for error message reporting and
the such. Now I had two choices:
(1) Leave the signature unchanged and issue an idup:
this.filename = to!string(filename); // issues an idup
(2) Change the signature to
this(string filename);
Now all client code that DID pass a string in the first place (the vast
majority) was safe _and_ efficient. The minority of client code was that
that had a char[] or a const(char)[] at hand. That code did not compile,
so it had to insert a to!string on the caller side.
As has been copiously shown in other languages, the need for
character-level mutable string is rather rare. So most of the time you
will not traffic in char[], but instead you'll have a immutable(char)[]
to start with. This further erodes the legitimacy of your concern.
I have no idea how to make this any more clearer. I explained it so many
times and in so many ways, even I understood it :o).
Andrei

Later on I decided to save the filename for error message reporting and
the such. Now I had two choices:
(1) Leave the signature unchanged and issue an idup:
this.filename = to!string(filename); // issues an idup
(2) Change the signature to
this(string filename);

Third choice: have two contructors.
this(in char[] filename) { this(filename.idup); }
this(string filename) { ... }
It's pretty easy to add a new version of a construcor or function when
you change things and need to keep backward compatibility. The question
is: should you?
--
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

To recap, if an API takes a string and all you have a char[], DO NOT
CAST IT. Call .idup - better safe than sorry. The API may evolve and
store a reference for later. Case in point: the up-and-coming
std.stdio.File constructor initially was:
this(in char[] filename);
Later on I decided to save the filename for error message reporting and
the such. Now I had two choices:
(1) Leave the signature unchanged and issue an idup:
this.filename = to!string(filename); // issues an idup
(2) Change the signature to
this(string filename);
Now all client code that DID pass a string in the first place (the vast
majority) was safe _and_ efficient. The minority of client code was that
that had a char[] or a const(char)[] at hand. That code did not compile,
so it had to insert a to!string on the caller side.
As has been copiously shown in other languages, the need for
character-level mutable string is rather rare. So most of the time you
will not traffic in char[], but instead you'll have a immutable(char)[]
to start with. This further erodes the legitimacy of your concern.

My file names are constructed most of the time. And most of the time
they are simple char[]s.
It is not obvious that File should store the file name. It's not
strictly necessary. It's an *implementation detail.* Now you expose
this implementation detail through the class interface, and you do this
without any good reason. You save a 150 byte allocation per file.
Nice.
I can understand when a hash takes an immutable key. It's in the hash's
contract. Various lazy functions could take immutable input to
guarantee correct lazy execution. But I think that overall use of
immutable types should be rare and thoroughly thought-out. They should
be used only when it's absolutely, provably necessary. That's why I
think aliasing string as immutable is a mistake. It felt wrong when I
discovered D a year ago, and it feels wrong now.

To recap, if an API takes a string and all you have a char[], DO NOT
CAST IT. Call .idup - better safe than sorry. The API may evolve and
store a reference for later. Case in point: the up-and-coming
std.stdio.File constructor initially was:
this(in char[] filename);
Later on I decided to save the filename for error message reporting and
the such. Now I had two choices:
(1) Leave the signature unchanged and issue an idup:
this.filename = to!string(filename); // issues an idup
(2) Change the signature to
this(string filename);
Now all client code that DID pass a string in the first place (the vast
majority) was safe _and_ efficient. The minority of client code was that
that had a char[] or a const(char)[] at hand. That code did not compile,
so it had to insert a to!string on the caller side.
As has been copiously shown in other languages, the need for
character-level mutable string is rather rare. So most of the time you
will not traffic in char[], but instead you'll have a immutable(char)[]
to start with. This further erodes the legitimacy of your concern.

My file names are constructed most of the time. And most of the time
they are simple char[]s.

It is not obvious that File should store the file name. It's not
strictly necessary. It's an *implementation detail.* Now you expose
this implementation detail through the class interface, and you do this
without any good reason. You save a 150 byte allocation per file.
Nice.

It's just an example, the point being that there things are always fast
and safe. In many cases there's much more at stake and you can't rely on
idioms that allocate memory needlessly.

I can understand when a hash takes an immutable key. It's in the hash's
contract. Various lazy functions could take immutable input to
guarantee correct lazy execution. But I think that overall use of
immutable types should be rare and thoroughly thought-out. They should
be used only when it's absolutely, provably necessary. That's why I
think aliasing string as immutable is a mistake. It felt wrong when I
discovered D a year ago, and it feels wrong now.

That may be because you are writing C in D. Immutable strings should
allow solid coding without much friction.
Andrei

To recap, if an API takes a string and all you have a char[], DO NOT
CAST IT. Call .idup - better safe than sorry. The API may evolve and
store a reference for later. Case in point: the up-and-coming
std.stdio.File constructor initially was:
this(in char[] filename);
Later on I decided to save the filename for error message reporting and
the such. Now I had two choices:
(1) Leave the signature unchanged and issue an idup:
this.filename = to!string(filename); // issues an idup
(2) Change the signature to
this(string filename);
Now all client code that DID pass a string in the first place (the vast
majority) was safe _and_ efficient. The minority of client code was that
that had a char[] or a const(char)[] at hand. That code did not compile,
so it had to insert a to!string on the caller side.
As has been copiously shown in other languages, the need for
character-level mutable string is rather rare. So most of the time you
will not traffic in char[], but instead you'll have a immutable(char)[]
to start with. This further erodes the legitimacy of your concern.

My file names are constructed most of the time. And most of the time
they are simple char[]s.

It is not obvious that File should store the file name. It's not
strictly necessary. It's an *implementation detail.* Now you expose
this implementation detail through the class interface, and you do this
without any good reason. You save a 150 byte allocation per file.
Nice.

It's just an example, the point being that there things are always fast
and safe. In many cases there's much more at stake and you can't rely on
idioms that allocate memory needlessly.

I can understand when a hash takes an immutable key. It's in the hash's
contract. Various lazy functions could take immutable input to
guarantee correct lazy execution. But I think that overall use of
immutable types should be rare and thoroughly thought-out. They should
be used only when it's absolutely, provably necessary. That's why I
think aliasing string as immutable is a mistake. It felt wrong when I
discovered D a year ago, and it feels wrong now.

That may be because you are writing C in D. Immutable strings should
allow solid coding without much friction.
Andrei

To recap, if an API takes a string and all you have a char[], DO NOT
CAST IT. Call .idup - better safe than sorry. The API may evolve and
store a reference for later. Case in point: the up-and-coming
std.stdio.File constructor initially was:
this(in char[] filename);
Later on I decided to save the filename for error message reporting and
the such. Now I had two choices:
(1) Leave the signature unchanged and issue an idup:
this.filename = to!string(filename); // issues an idup
(2) Change the signature to
this(string filename);
Now all client code that DID pass a string in the first place (the vast
majority) was safe _and_ efficient. The minority of client code was that
that had a char[] or a const(char)[] at hand. That code did not compile,
so it had to insert a to!string on the caller side.
As has been copiously shown in other languages, the need for
character-level mutable string is rather rare. So most of the time you
will not traffic in char[], but instead you'll have a immutable(char)[]
to start with. This further erodes the legitimacy of your concern.

they are simple char[]s.

string basename;
...
auto f = File(basename ~ ".txt");

It is not obvious that File should store the file name. It's not
strictly necessary. It's an *implementation detail.* Now you expose
this implementation detail through the class interface, and you do this
without any good reason. You save a 150 byte allocation per file.
Nice.

and safe. In many cases there's much more at stake and you can't rely on
idioms that allocate memory needlessly.

To recap, if an API takes a string and all you have a char[], DO NOT
CAST IT. Call .idup - better safe than sorry. The API may evolve and
store a reference for later. Case in point: the up-and-coming
std.stdio.File constructor initially was:
this(in char[] filename);
Later on I decided to save the filename for error message reporting
and the such. Now I had two choices:
(1) Leave the signature unchanged and issue an idup:
this.filename = to!string(filename); // issues an idup
(2) Change the signature to
this(string filename);
Now all client code that DID pass a string in the first place (the
vast majority) was safe _and_ efficient. The minority of client code
was that that had a char[] or a const(char)[] at hand. That code did
not compile, so it had to insert a to!string on the caller side.
As has been copiously shown in other languages, the need for
character-level mutable string is rather rare. So most of the time you
will not traffic in char[], but instead you'll have a
immutable(char)[] to start with. This further erodes the legitimacy of
your concern.

they are simple char[]s.

string basename;
...
auto f = File(basename ~ ".txt");

It is not obvious that File should store the file name. It's not
strictly necessary. It's an *implementation detail.* Now you expose
this implementation detail through the class interface, and you do this
without any good reason. You save a 150 byte allocation per file.
Nice.

and safe. In many cases there's much more at stake and you can't rely on
idioms that allocate memory needlessly.

But I think that overall use of
immutable types should be rare and thoroughly thought-out. They should
be used only when it's absolutely, provably necessary.

I suggest that that's exactly backwards <g>. Mutable types should be the
rare, carefully considered ones.

That's why I
think aliasing string as immutable is a mistake. It felt wrong when I
discovered D a year ago, and it feels wrong now.

I know it feels wrong. That's the C background talking. I went through
the same thing. It's sort of like OOP if you're used to C. It takes a
while before it clicks, in the meantime, it feels wrong and stupid.

Walter Bright:
This is an interesting topic.
I like immutability, but sometimes I also like mutability.

languages that are considered very good at dealing with strings (like Perl) use
immutable strings. The fascinating thing about strings in such languages is:
"Nobody notices they are immutable, they just work."<

Languages that have immutable strings often have:
- "String interning", to improve performance...
- A good garbage collector, to cope with the increased allocation-deallocation
traffic.
- Sometimes the garbage collector is able to see that two unrelated strings are
equal, and keep only one of them. Experiments have shown this reduces a lot the
memory used by many Java programs.
- Strings often keep their hash value stored beside them, so it's computed only
once, the first time you actually need the hash value (this also means the hash
value is initialized to an unvalid value).
People notice such strings are immutable. Usually it's fine, but once in a
while it's a pain.
Note that in Python you usually try to avoid looping on single chars because
it's a too much slow thing to do, so you try to use string methods and regular
expressions as much as possible. But I like a lower level language because it
gives me the *freedom* to read and process the single chars efficiently.
(Python is implemented in C, and writing the Python interpreter itself with a
language that uses immutable strings only is probably a pain).
Such languages like Python also always offer you an escape, for example in the
standard library of Python there is a mutable char array (Python3 is different,
it has as built-ins immutable unicode strings + mutable arrays of bytes + maybe
an immutable array of bytes):
This is Python 2.5:

After all, it never occurs to anyone to think that the integer 123 could be a
"mutable" integer and perhaps be 133 the next time you look at it.<

Because they are small numbers. With the multi-precision GMP library you can
mutate numbers in place because this becomes useful when you manage huge
numbers.
Note that there are Python bindings for GMP, they manage numbers in an
immutable way to respect the Python style, but it's not much efficient, see
explanation here in the middle:
http://gmpy.sourceforge.net/

The way to do strings in D is to have them be immutable. If you are building a
string by manipulating its parts, start with mutable, when finished then
convert it to immutable and 'publish' it to the rest of the program.<

Seems acceptable.

you'll find you never need to defensively dup the string "just in case" and
things just seem to naturally work out.<

If you put strings in an associative array as keys, you usually want them to be
immutable to keep their correct place in the hash and avoid big troubles.
For such purpose Python has mutable and immutable arrays (named list and
tuple), where you can only use tuples as dictionary keys.
So built-in associative arrays of D too may appreciate immutable arrays more :-)
Bye,
bearophile

The way to do strings in D is to have them be immutable. If you are building a
string by manipulating its parts, start with mutable, when finished then
convert it to immutable and 'publish' it to the rest of the program.<

Most of the times this seems acceptable.
But if such text is very long (example, 20 MB) and you want to pass it around
for various functions to process&modify it, they you may want to keep it
mutable (this is a quite uncommon situation, but it's happened to me, during
genomic data processing).
Bye,
bearophile

When we first got into what to do with strings and
const/immutable/mutable, I was definitely in the camp that strings
should be mutable char[], or at worst const(char)[]. The thing is,
Andrei pointed out to me, languages that are considered very good at
dealing with strings (like Perl) use immutable strings. The fascinating
thing about strings in such languages is:
"Nobody notices they are immutable, they just work."
So what is it about immutability that makes strings "just work" in a
natural and intuitive manner? The insight is that it enables strings,
which are reference types, to behave exactly as if they were value types.
After all, it never occurs to anyone to think that the integer 123 could
be a "mutable" integer and perhaps be 133 the next time you look at it.
If you put 123 into a variable, it stays 123. It's immutable. People
intuitively expect strings to behave the same way. Only C programmers
expect that once they assign a string to a variable, that string may
change in place.
C has it backwards by making strings mutable, and it's one of the main
reasons why dealing with strings in C is such a gigantic pain. But as a
longtime C programmer, I was so used to that I didn't notice what a pain
it was until I started using other languages where string manipulation
was a breeze.
The way to do strings in D is to have them be immutable. If you are
building a string by manipulating its parts, start with mutable, when
finished then convert it to immutable and 'publish' it to the rest of
the program. Mutable char[] arrays should only exist as temporaries.
This is exactly the opposite of the way one does it in C, but if you do
it this way, you'll find you never need to defensively dup the string
"just in case" and things just seem to naturally work out.

So your suggestion is to do something like:
string manipulate() {
char[] buf = read_20M_string_data();
initialization_mangling(buf);
return (string)buf;
}
string 20M_string = manipulate();
As long as we don't want to mangle the storage later in the program.

I think what Burton is saying is by annointing immutable(char)[] as the
type "string," you are essentially sending a message to developers that
all strings should be immutable, and all *string parameters* should be
declared immutable.

Phobos is being changed to accept in char[] instead of string
wherever applicable. As far as what the default "string" ought to be,
immutable(char)[] is the safest of the three so I think it should be that.

I vaguely remember someone suggesting that "string" be the alias for
immutable character arrays and "text" be the alias for mutable character
arrays. For some people, it might be easier to relate the word "text" as
being something that can be edited in-place.
I'm not advocating or rejecting this ... just trying to recall the original
poster's suggestion.
--
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

In D you will be able to break any design with a cast, unless you use
the not-yet-defined D2 which disallows all risky casts. So the fact that
you can cast const away is hardly changing anything.

Oh, and this is an idea which is almost exactly 20 years old, from back when
ANSI was developing the C standard. They had defined const but then made it so
that const cannot be casted off, making the language unimplementable.
We cannot avoid these APIs which led to this unimplementability because it's a
common problem: a mutable object goes through an environment where it is
treated as const, but afterwards needs to continue to be mutable. For example,
you might have:
alias const (char) [] string;
alias char [] mstring;
// Return the first match within the string, or null if there is no match.
string match (string text, RE expression);
mstring text;
auto submatch = match (text, expression);
submatch [] = ' '; // Fails.
The only non-casting alternative is to define:
mstring mmatch (mstring text, RE expression);
But there are two problems. One, it's the same exact function which does the
same exact thing. So you're wasting the programmer's time for moving match into
a template function and then overloading it twice, you're wasting the reader's
time for having to learn two functions and knowing when to apply either, and
you're wasting the processor's time for having to load both functions. More
critically, it's not descriptive; it in fact implies that mmatch may or will
modify the data within the text, when it won't, ever. So if the purpose is to
make const describe the nature of the implementation of the function, it has
completely failed and so can't be trusted anywhere.
Dennis Ritchie argued against const when it appeared
(http://www.lysator.liu.se/c/dmr-on-noalias.html).

In D you will be able to break any design with a cast, unless you use
the not-yet-defined D2 which disallows all risky casts. So the fact that
you can cast const away is hardly changing anything.

Oh, and this is an idea which is almost exactly 20 years old, from back when
ANSI was developing the C standard. They had defined const but then made it so
that const cannot be casted off, making the language unimplementable.
We cannot avoid these APIs which led to this unimplementability because it's a
common problem: a mutable object goes through an environment where it is
treated as const, but afterwards needs to continue to be mutable. For example,
you might have:
alias const (char) [] string;
alias char [] mstring;
// Return the first match within the string, or null if there is no match.
string match (string text, RE expression);
mstring text;
auto submatch = match (text, expression);
submatch [] = ' '; // Fails.
The only non-casting alternative is to define:
mstring mmatch (mstring text, RE expression);
But there are two problems. One, it's the same exact function which does the
same exact thing. So you're wasting the programmer's time for moving match into
a template function and then overloading it twice, you're wasting the reader's
time for having to learn two functions and knowing when to apply either, and
you're wasting the processor's time for having to load both functions. More
critically, it's not descriptive; it in fact implies that mmatch may or will
modify the data within the text, when it won't, ever. So if the purpose is to
make const describe the nature of the implementation of the function, it has
completely failed and so can't be trusted anywhere.

If you have something which works everywhere please tell us because we've been
trying to find one for a long time, but as far as I know there is no solution.
The best I've ever gotten to is:
// "A" means that the constness of the return type depends upon the
constness of the argument. There are dozens of ways to specify the same thing.
const (A) mstring match (const (A) mstring text, RE expression);
But setting aside whether that helps or hinders self-documentation, that's far
from the only place at which you put mutable data through a const section that
you need to modify later. What if the function were instead:
struct REMatch
{
string match; /// The matched string.
size_t offset; /// Offset within the string where the match occurs.
string [] groups; /// Matched groups.
this (string text);
}
What am I going to do about this now without using templates? If you define a
special syntax to make this work, then I can give you something even further
which won't.

If you have something which works everywhere please tell us because
we've been trying to find one for a long time, but as far as I know
there is no solution. The best I've ever gotten to is:
// "A" means that the constness of the return type depends upon the
constness of the argument. There are dozens of ways to specify the
same thing. const (A) mstring match (const (A) mstring text, RE
expression);
But setting aside whether that helps or hinders self-documentation,
that's far from the only place at which you put mutable data through
a const section that you need to modify later. What if the function
were instead:
struct REMatch { string match; /// The matched string. size_t offset;
/// Offset within the string where the match occurs. string []
groups; /// Matched groups.
this (string text); }
What am I going to do about this now without using templates? If you
define a special syntax to make this work, then I can give you
something even further which won't.

The problem is you set up artificially constrained rules, i.e. "without
using templates". You can't use the same struct to store mutable types
and non-mutable types mixed with always-mutable types, and for good
reasons. No type system will allow 100% of the correct programs to run.
Why the fuss. Use a gorram template and call it a day.
Andrei

If you have something which works everywhere please tell us because
we've been trying to find one for a long time, but as far as I know
there is no solution. The best I've ever gotten to is:
// "A" means that the constness of the return type depends upon the
constness of the argument. There are dozens of ways to specify the
same thing. const (A) mstring match (const (A) mstring text, RE
expression);
But setting aside whether that helps or hinders self-documentation,
that's far from the only place at which you put mutable data through
a const section that you need to modify later. What if the function
were instead:
struct REMatch { string match; /// The matched string. size_t offset;
/// Offset within the string where the match occurs. string []
groups; /// Matched groups.
this (string text); }
What am I going to do about this now without using templates? If you
define a special syntax to make this work, then I can give you
something even further which won't.

The problem is you set up artificially constrained rules, i.e. "without
using templates". You can't use the same struct to store mutable types
and non-mutable types mixed with always-mutable types, and for good
reasons. No type system will allow 100% of the correct programs to run.
Why the fuss. Use a gorram template and call it a day.

Ah, so you don't have a solution.
What you'll have instead are programs which are developed one way but then come
to an impasse where they need to cast off const but can't. So after some
cursing, the programmer starts wasting his time modifying all of his code to be
templated, because avoiding casting off const has exactly the same viral
progression as adding in const (I did it in C++ the last time we thought const
might be able to make code better-optimised).
This goes fine, until he comes up to a library which hasn't gone through the
same process, so it has a normal interface. Or he comes up to an interface
itself, which will not normally be templatable. He doesn't have the code, so he
can't change the library. What does he do then?
There are four options I can see. One, he could make a copy of the supposedly
const data that's been trapped by the library, which won't always work. Two, he
could take the pointer and length from the slice and figure out where in his
mutable data the slice exists at, which might be impossible depending upon
pointer arithmetic restrictions. Three, he could reimplement the library
functionality himself, which may be impossible. Four, he could move on to a
language which doesn't make his job hard just so that it can sometimes add
numbers faster.
This is actually reminding me of C++ the more I think about it. C++'s bad
features are so bad because they're far-reaching but they couldn't be
consistently applied. So if you read the specification, you'll find twenty or
so caveats that try to make the feature work, when the proper thing to have
done was to realise that a feature which doesn't naturally fit in a language
shouldn't be in that language. Yet here we have a feature that's not just C++,
but it's C++^2.
Multiple keywords. The threat that abuse will eventually lead to code being
compiled incorrectly, coupled with forcing abuse on common code (I stress that
any data which is marked as invariant at any point but is not actually
invariant will cause optimisation issues if it's given any weight whatsoever).
At least three different ways to define a const. "const int *foo ()" and "int
*foo () const" equivalency. A syntax which makes declarations hard to read. And
coming, you say, is enforced const-correctness, just to make things extra awful.

The problem is you set up artificially constrained rules, i.e.
"without using templates". You can't use the same struct to store
mutable types and non-mutable types mixed with always-mutable
types, and for good reasons. No type system will allow 100% of the
correct programs to run. Why the fuss. Use a gorram template and
call it a day.

Ah, so you don't have a solution.

I do have a solution, the problem is sometimes no amount of convincing
will do any good. Arguments focusing on niche cases can be formulated
against every single restriction of a type system. Const and immutable
are supposed to express realities about data. Sometimes said realities
undergo dialectics that are difficult to express. The system is not
perfect, and cannot be made perfect within the constraints at hand (e.g.
without putting undue complexity on the programmer). If you're bent from
the get-go against const and immutable, there is no good scenario that
is good enough, no bad scenario that's infrequent and avoidable enough,
and no way to purport a gainful dialog.
Andrei

I know that a better way to code this example would have been to use
the .idup functionality, but that is not the point. I relied on the
compiler ensuring that everything declared as immutable would not be
modified. The compiler failed.

It is the same issue. When you use a cast, you are *explicitly*
defeating the language's type checking ability. It means that the onus
is on the one doing the cast to get it right.

Except when you want invariant data, then cast is *required*. At that
point, it is a language feature, not a defeat of the typesystem. I think
there is some merit to the arguments presented in this thread, but I
don't think the answer is to get rid of invariant. Perhaps make the
compiler more strict when creating invariant data? I liked the ideas
that people presented about having unique mutable references (before and
in this thread). This might even be solvable in a library given all the
advances in structs.
So it's not exactly the same issue, because in one you are doing
something totally useless and stupid. And in the other, it is a language
*requirement* to use casting to get invariant data. However, in both
cases, the onus is on the developer, which sucks in the latter case...
Walter: Use invariant when you can, it's the best!
User: ok, how do I use it?
Walter: You need to cast mutable data to invariant, but it's on you to
make sure nobody changes the original mutable data. Casting circumvents
the typesystem, so the compiler can't help you.
User: :(
-Steve

Walter: Use invariant when you can, it's the best! User: ok, how do I
use it?
Walter: You need to cast mutable data to invariant, but it's on you to
make sure nobody changes the original mutable data. Casting
circumvents the typesystem, so the compiler can't help you. User: :(

Unfortunately, we could not come up with a typesafe scheme for going
from mutable to immutable that was reasonable. The transition is up to
the user to do correctly, but it isn't a terrible burden, and is simple
to get right.

In fact, since we last discussed that, dmd technology has made enough
strides (particularly wrt manipulation of copies) that we can define
Unique!T and something a la Java's StringBuilder with ease.
Unfortunately current bugs in constructor implementation delay
definition of such artifacts.

I know that a better way to code this example would have been to use
the .idup functionality, but that is not the point. I relied on the
compiler ensuring that everything declared as immutable would not be
modified. The compiler failed.

defeating the language's type checking ability. It means that the onus
is on the one doing the cast to get it right.

Except when you want invariant data, then cast is *required*.

Not at all. Calling idup (or, more elegantly, to!TargetType) or
assumeUnique under the circumstances documented by assumeUnique are all
safe. Sometimes the price for added safety is one extra copy. But the
added safety is worth it more often than not.

idup is not a viable option in certain cases, if memory usage or
performance are important. It might not even be available if the object
in question doesn't implement a function that does it. assumeUnique
suffers from the same problems as casting. It just avoids any potential
casting that changes types instead of constancy.
I agree that in 90% of cases, using invariant strings is beneficial, and
not a burden. I think the arguments against are for those small cases
where performance is significantly hindered, and having a library which
uses string where it should use in char[] makes life difficult for those
cases. I understand you are fixing this, so that is good.

As far as signatures of functions in std.string, I agree that those not
needing a string of immutable characters should just accept in Char[]
(where Char is one of the three character types). That should make
people using mutable and immutable strings equally joyous.

What might be useful is coming up with an alias for char[], like mstring
or something that shakes off the notion that a mutable char[] is not a
string.

At that
point, it is a language feature, not a defeat of the typesystem. I
think there is some merit to the arguments presented in this thread,
but I don't think the answer is to get rid of invariant. Perhaps make
the compiler more strict when creating invariant data? I liked the
ideas that people presented about having unique mutable references
(before and in this thread). This might even be solvable in a library
given all the advances in structs.

Unique has been discussed extensively a couple of years ago when we were
defining const and immutable. We decided to forgo it and go with
assumeUnique.

I see in your reply to walter that it could be done with some bug fixes,
I think this should be an important step to make.

So it's not exactly the same issue, because in one you are doing
something totally useless and stupid. And in the other, it is a
language *requirement* to use casting to get invariant data. However,
in both cases, the onus is on the developer, which sucks in the latter
case...

NO. It is not a language requirement. If what you have is mutable data
and someone else wants immutable data, make a copy.

Unless copying hinders performance so much that you would rather take the
safety hit and start casting. Imagine having to copy every message that
was received on a network stream just so you could use string functions
on it. Or having to duplicate all the data read into a small buffer from
a file just to search for strings in it.

Walter: Use invariant when you can, it's the best! User: ok, how do I
use it?
Walter: You need to cast mutable data to invariant, but it's on you to
make sure nobody changes the original mutable data. Casting
circumvents the typesystem, so the compiler can't help you. User: :(

Casts to immutable should not be part of most programs.

assumeUnique is a cast. You have not avoided that. The extra steps it
goes through is not enough to quiet this discussion. We are not talking
about using assumeUnique or casting everywhere in a program. We are
talking about how difficult it is to call functions that take strings
when they should take const(char)[] (a practice that is going to continue
since the string label has been attached to immutable(char)[] only) in
the rare cases when you need the functions.
-Steve

A more accurate way would be for the string type to be "const (char)
[]", and for functions which retain strings that can't change to take
"invariant (char) []". That makes pretty good claims about the nature
of the string, but it would clearly result in lots of cast
management.

needs to be done? What I need to do is occasionally insert an .idup on
the client side because the callee wants a copy. So that's that.

can't guarantee anything about the nature of the object because you
need to cast to "invariant (char) []" to be able to interface with any
API.
The good side is that when I changed it to be defined as "const (char)
[]" only one line of code made a squeak. That gives me solid actionable
information. If an API is declared as istring, then whatever you give
it must not ever change. If an API is declared as string, then whatever
happens in there, it won't change the data. Pretty good!

I have trouble following what you're saying. If what you're saying is
essentially that in char[] is a better parameter definition than string
for functions that don't need to escape their string argument, then yes,
you are entirely right.
So what I recommend is:
void foo(in char[] s); // foo looks at s, doesn't escape it
void bar(string s); // bar needs to save s
void baz(char[] s); // baz needs
to change s' contents

I think what Burton is saying is by annointing immutable(char)[] as the
type "string," you are essentially sending a message to developers that
all strings should be immutable, and all *string parameters* should be
declared immutable. What this does is force developers who want to deal
in const or mutable chars have to do lots of duplication or casting,
which either makes your code dog slow, or makes your code break const.
Evidence is how (at least in previous releases) anything in Phobos that
took an argument that was a utf8 string of characters used the parameter
type "string", making it very difficult to use when you don't have string
types. If you want to find a substring in a string, it makes no sense
that you first have to make the argument invariant. a substring function
isn't saving a pointer to that data.
I think the complaint is simply that string is defined as immutable(char)
[] and therefore is promoted as *the only* string type to use. Using
other forms (such as in char[] or const(char)[] or char[]) doesn't look
like the argument is a string, when the word "string" is already taken to
mean something else.
-Steve