Andrei Alexandrescu (See Website For Email) wrote:
> Don Clugston wrote:
>> Andrei Alexandrescu (See Website For Email) wrote:
>>> Similarly, let's say that a group of revolutionaries convinces Walter
>>> (as I understand happened in case of using "length" and "$" inside
>>> slice expressions, which is a shame and an absolute disaster that
>>> must be undone at all costs) to implement "auto"
>>
>> This off-hand remark worries me. I presume that you mean being able to
>> reference the length of a string, from inside the slice? (rather than
>> simply the notation).
>> And the problem being that it requires a sliceable entity to know its
>> length? Or is the problem more serious than that?
>> It's worrying because any change would break an enormous amount of code.
>
> It would indeed break an enormous amount of code, but "all costs"
> includes "enormous costs". :o) A reasonable migration path is to
> deprecate them soon and make them illegal over the course of one year.
>
> A small book could be written on just how bad language design is using
> "length" and "$" to capture slice size inside a slice expression. I
> managed to write two lengthy emails to Walter about them, and just
> barely got started. Long story short, "length" introduces a keyword
> through the back door, effectively making any use of "length" anywhere
> unrecommended and highly fragile.
That hadn't occurred to me, but you're right. I never use length in
that context precisely because it does look like it could be a local
identifier, whereas I know it'll be clear it's not if I use $. Also
"length" is just too long to be of much use to me as a shortcut. If I'm
going to be that verbose I might as well type out the whole
"varname.length".
> Using "$" is a waste of symbolic real
> estate to serve a narrow purpose; the semantics isn't naturally
> generalized to its logical conclusion;
I do use this one, but I agree. It is unnecessarily special cased for
built-in array types. For user-defined types, in 'myvar[0..$]' the $
does not expand to 'myvar.length' as one would naturally expect it to.
Or any sort of opLength() call. It's just a syntax error.
> and the choice of symbol itself
> as a reminiscent of Perl's regexp is at best dubious ("#" would have
> been vastly better as it has count connotation in natural language, and
> making it into an operator would have fixed the generalization issue).
I think you'll have to admit that's just your personal taste there.
Using $ to indicate 'end' is a regexp thing, but regexp's go way beyond
Perl.
I don't really care what it is as long as there's an terse way to
specify 'the end' in an indexing expression.
> As things stand now, the rules governing the popping up of "length" and
> "$" constitute a sudden boo-boo on an otherwise carefully designed
> expression landscape.
After trying to write a multi-dimensional array class, my opinion is
that D slice support could use some upgrades overall. What I'd like to see:
--MultiRange Slice--
* A way to have multiple ranges in a slice, and a mix slice of and
non-slice indices:
A[i..j, k..m]
A[i..j, p, k..m]
I'm not saying built-in arrays like int[] should allow the above
expressions, but that at least user types should be allowed to have such
opSlice methods. (Currently opSlice's are limited to having 2 arguments
that represent the values that appear on either side of a single '..'
token. You can only have two arguments max, but the arguments can be of
any type.)
The problem is that opSlice has to look like opSlice(T1 lo, T2 hi) right
now -- just two parameters (or zero).
One possible solution is to turn a single i..j into a single int[2]
argument (or a mytype[2], for the general case). But that means one
won't be able to distinguish A[[1,3]] from A[1..3]. It also means more
interesting extensions to slice syntax, like adding a stepsize on a
range, will be ruled out.
Another solution is a built-in slice type. Ranges like a..b would get
converted to slice instances automatically. It would basically be a
struct with two ints in the simplest case, but to support user types as
indexes it would need to be template-like, i.e. slice!(type). A slice
would look basically like
struct slice(T=int) { T lo,hi; }
It could also have a .step property. With the above, lo and hi would
have to be of the same type, but really it makes sense to let them
differ, so slice!(T1,T2). For a range with stepsize,
slice!(Tlo,Thi,Tstep).
To make writing opSlice methods sane, a single number like the p above
should be converted to a slice also. So all arguments passed to opSlice
would be of type slice, and in the simple case of integer indices, it
would just be:
Type opSlice(slice s) { return x[s.lo..s.hi]; }
since integers would be the default types for slice.
--User Definable '$'--
* A way to specify 'the end' in user types. In the general case the
meaning of '$' in a slice cannot be known (because any type can be used
as an index), nor can it be simply substituted with something like a
.length property, because it may depend on context. Consider a
multi-dimensional array class --
A[0..$,3..$]
The first $ means one thing, and the second one means another.
One solution - make an opLength that gets called with the parameter
number in which the $ appears. [My hypothesis is that the param# is the
only context that ever matters in determining the meaning of $.] So in
the above int opLength(int i) would get called twice, once with i==0,
once with i==1. opLength can be made to return any type if the user
just wants it to get 'passed through' to the opSlice call. If you don't
need the context you can define it as opLength().
--Step sizes--
This is a handy feature of Python slices. The general syntax for a
slice in Python is lo:hi:step, meaning go from 'lo' to 'hi', stepping by
'step' at a time. But any of the 3 components can be left out.
lo:hi means step=1.
lo::2 means go to the end, stepping by 2.
:hi means 0 to hi. Negative steps are also allowed:
hi:lo:-1 means go backwards from hi to lo
::-1 go backwards from the last to first element
D syntax could be something like lo..hi:step. I like the omission part
of Python's syntax. If D had that then most uses of $ would go away
since we'd have A[3..] as an alternative to A[3..$].
--bb

== Quote from Andrei Alexandrescu (See Website For Email)
(SeeWebsiteForEmail@erdani.org)'s article
> Don Clugston wrote:
> > Andrei Alexandrescu (See Website For Email) wrote:
> >> Similarly, let's say that a group of revolutionaries convinces Walter
> >> (as I understand happened in case of using "length" and "$" inside
> >> slice expressions, which is a shame and an absolute disaster that must
> >> be undone at all costs) to implement "auto"
> >
> > This off-hand remark worries me. I presume that you mean being able to
> > reference the length of a string, from inside the slice? (rather than
> > simply the notation).
> >
> > And the problem being that it requires a sliceable entity to know its
> > length? Or is the problem more serious than that?
> > It's worrying because any change would break an enormous amount of code.
>
> It would indeed break an enormous amount of code, but "all costs"
> includes "enormous costs". :o) A reasonable migration path is to
> deprecate them soon and make them illegal over the course of one year.
>
> A small book could be written on just how bad language design is using
> "length" and "$" to capture slice size inside a slice expression. I
> managed to write two lengthy emails to Walter about them, and just
> barely got started. Long story short, "length" introduces a keyword
> through the back door, effectively making any use of "length" anywhere
> unrecommended and highly fragile. Using "$" is a waste of symbolic real
> estate to serve a narrow purpose; the semantics isn't naturally
> generalized to its logical conclusion; and the choice of symbol itself
> as a reminiscent of Perl's regexp is at best dubious ("#" would have
> been vastly better as it has count connotation in natural language, and
> making it into an operator would have fixed the generalization issue).
> As things stand now, the rules governing the popping up of "length" and
> "$" constitute a sudden boo-boo on an otherwise carefully designed
> expression landscape.
I guess the question is, what is the best alternative. I agree about
'length', and I usually don't use "length" in this way, but I do things
like x[$-2..$] all the time. Some proposals:
1. Symbols
Going in the symbol direction, it might also make sense to *add* something
like "^" for the start of a container. This would be useful with AAs and
user defined types. We could use both: a[^+2..$-2]. This would only really
be useful with containers that did not index from 0, i.e. non-integer or AA
indices.
char[char[]] words;
words[^.."brink"]; // all words in dictionary before 'brink'
words["brack"..$] // instead of symbols
Which could translate to:
words.opSlice(words.opBegin(), "brink")
words.opSlice("brack", opEnd())
2. I like this better: I call it "with without with"
In order to maximize the dollar value :) of syntax symbol real estate, the
meaning of $ could be expanded as follows:
Something like X[$begin..$end] could be a shortcut for either X[0..X.length]
for arrays, or X[X.opBegin()..X.opEnd()] for user types.
I think the above solves the problem, doesn't it? The "$end" phrase is terse
enough for most coders, unique enough to avoid namespace conflicts, avoids the
problem of keywords ghosting in and out of existence in mid-expression, and
avoids ruining $ (or # if #end is used instead) for the symbol space.
We can stop right there... or go on to something for post-1.0:
Other applications of $ could be:
A. Syntax reduction for enumerated types and fields:
struct Colors {
enum { red, green, blue };
void set(int c);
};
Colors c;
c.set($red);
This use of enumerated type is becoming more common, having "$" be a
shortcut for <context>.X might make a lot of code more readable. The
question then becomes, "Which contexts are searched for .X?"
B. Reserved for language features.
Leave this open for language designer use. All $xyz expressions are context
dependent keywords. This allows much shorter words to be used, and allows
language features to be named intelligently without worrying about crashing
into user-defined names. For example, C could never introduce a new keyword
called "begin" or "end", since it would break nearly every C program, but we
can easily add a keyword called $begin which will not conflict with anything,
since the $ saves us from conflicts.
Most of the discussions for new features here have at least some arguments on
how to add the new syntax for the feature, what other uses those symbols could
be used for, etc. The $xyz route allows Walter to introduce lots of language
concepts in the future without conflicts. It could even be used to prototype
keywords that are experimental. They can even be removed or promoted to non-$
status later if desired.
NOTE that if # was used instead of $, it would dovetail nicely with the
"#line" and "#file" quasi-keywords.
> > These issues you're raising seem to be far too fundamental to be fixed
> > in the next few days, casting grave doubts on whether a D1.0 release on
> > Jan 1 is a good idea.
>
> The lvalue/rvalue issue is fundamental. I'm not in the position to
> assess whether it's a maker or breaker of D 1.0.
>
> The "length"/"$" issue is not fundamental the same way that C's
> declaration syntax, Java's throw specifications, C++'s use of "<" and
> ">" for templates, and Mao Zedong's refusal to use a toothbrush are not
> fundamental. It will "just" go down in history as a huge embarrassment
> and a good resource for cheap shooters and naysayers. If I understand
> its genesis, it will also be a canonical example of why design by
> committee is bad.
>
> Andrei
I like the terseness of "$" but I'm willing to do away with it if it
really is that bad. What I'm wondering, is how far do you think we
need to roll back the syntax, before it's "The Right Thing" (tm) again?
Do we really need to go all the way to myarray[0..myarray.length], or
can some intermediate solution work?
Kevin

Benji Smith wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> Let me illustrate further why ident is important and what solution we
>> should have for it. Consider C's response to ident:
>>
>> #define IDENT(e) (e)
>>
> > ...
> >
>> ...leading to the following implementation of ident:
>>
>> auto ident(auto x) {
>> return x;
>> }
>
> I don't get it.
>
> Why is it necessary (or even desirable) for functions to return lvalues?
Methods might want to return lvalues, but indeed the need is not
overwhelming. (They could return pointers after all.) But the point is
different. You want to have a grip on all types, and ident shows that
you can't. For example, in current D you can't (barring a hack that I
saw in a post around here) have a template that takes a function and
creates one of the exact signature. That is a vastly useful and
desirable thing to want; think e.g. of a function that memoizes any
other function.
Andrei

Derek Parnell wrote:
> On Wed, 20 Dec 2006 06:24:28 -0800, Andrei Alexandrescu (See Website For
> Email) wrote:
>
>
>> A small book could be written on just how bad language design is using
>> "length" and "$" to capture slice size inside a slice expression. I
>> managed to write two lengthy emails to Walter about them, and just
>> barely got started.
>
> Please share your thoughts here if you can too.
Gladly; I dug my email and let me share a couple of excerpts.
---------
int length = 5;
int[] a = new int[length * 2];
int[] b = a[length .. length * 2];
int c = a[length - 1 .. (b[0 .. length])[0]);
In each of its uses, length has a different semantics. The behavior is
well-defined for all cases, but nonintuitive and about as pleasant as
nails on the blackboard.
Now D has a compile-time option to ban the "length" name in scopes in
which the slice operator is used. That would render the example above
illegal. There is also a rule that identifiers in nested scopes cannot
mask one another. So length will be banned from *any* scope that nests a
scope using a slice:
int length;
if (a) {
foreach (b; c) {
while (d) {
switch (e) {
case f: g = h[0 .. length - 1];
...
}
}
}
}
This code will not compile. Worse, it *will* compile until you add the
slice operation. Combining the two rules and taking them to their
logical conclusion, any code using "length" is frail because there's
always a risk that somebody might insert a slice, rendering the entire
function uncompilable. What happened is that now "length" has become a
backdoor-introduced keyword. Books will advise users to never use it
even when it works, coding standards will ban it, language lawyers will
use it to detract D, and users of other languages will smile
condescendingly and stay with their languages.
There are a few ways out of it. "length" could be actually made a
keyword. But even that one isn't very uniform, and steals yet another
good identifier name.
Another way out of it is to ban "length" but stick with "$". But "$" has
another bunch of problems. It's a special character used only once, and
only in a very particular situation. There is no general concept
standing behind its usage: it sticks out like a sore thumb. "$" isn't
the last index in an array. It's that only when used inside a slice, and
refers only to the innermost index of the array. Quite a waste of a
special character out there, and to little usefulness.
But if we made "$" into an operator identifying the last element of
_any_ array, which could refer to the last element of _the left-hand
side_ array if we so want, then all of a sudden it becomes useful in a
myriad of situations:
int i = a[$ - 1]; // get last element
int i = a[$b - 1]; // get a's element at position b.length - 1
if (a[$ - 1] == x) { ... }
if ($a > 0) { ... }
if ($a == $b) { ... }
swap(a[0], a[$ - 1]); // swap first and last element
---------------
Grammar for nullary/unary $:
---------------
I think I nailed down the way the count operator $ can work in a manner
that's terse, expressive, and safe.
My basic goal is to enable the operator $ to be unary (applying to an
array) to return its size, and also nullary (applying to nothing) to
implicitly mean "fetch the size of the innermost array in the
expression". So this code should work:
int[] foo;
foo[$ - 1]; // refers to foo's last element
foo[$foo - 1]; // same
int[][] bar;
bar[foo[$]]; // refers to bar indexed with foo's last element
bar[foo[$bar]]; // refers to bar indexed with foo's element at $bar
To insert my operator $ within D's grammar, go to the grammar page:
http://www.digitalmars.com/d/expression.html$UnaryExpression and scroll
down to Unary Expression. There, add the following rules:
UnaryExpression:
PostfixExpression
& UnaryExpression
... etc. etc. ...
$ Identifier
$ PostfixExpression . Identifier
$ PostfixExpression ( )
$ PostfixExpression ( ArgumentList )
$ IndexExpression
$ SliceExpression
$ ArrayLiteral
$ ( Expression )
Now a unary expression can be the $ operator followed by an identifier,
a member access, a function call, an array access, or a slice expression
(awesome! pick the size of the slice!), a literal array (for
conformity), or a parenthesized expression. Perfect!
But we haven't yet filled the role of $ as a nullary operator. To do so,
let's go in the grammar to
http://www.digitalmars.com/d/expression.html$PrimaryExpression and
append one more rule to it the PrimaryExpression rule:
PrimaryExpression:
Identifier
.Identifier
... etc. etc. ...
$
Now the grammar is unambiguous and will properly distinguish unary and
nullary uses of the $ operator.
This is more elegant than the current crap with "$" and "length" popping
up. Besides, you can now use $ in many more places than inside []s.
However, the grammar size does increase quite a bit, which is more fuss
than I hoped for just one operator.
A simpler grammar would have been to simply allow:
UnaryExpression:
PostfixExpression
& UnaryExpression
... etc. etc. ...
$ PostfixExpression
But this would have been ambiguous. If the compiler sees "$-1", then the
bad grammar says that's a unary use of $ because -1 is a
PostfixExpression. But that's not what we wanted! We wanted $ to be
nullary. That's why I needed to put all the cases in UnaryExpression.
Andrei

Andrei Alexandrescu (See Website For Email) wrote:
> Derek Parnell wrote:
>> On Wed, 20 Dec 2006 06:24:28 -0800, Andrei Alexandrescu (See Website For
>> Email) wrote:
>>
>>
>>> A small book could be written on just how bad language design is
>>> using "length" and "$" to capture slice size inside a slice
>>> expression. I managed to write two lengthy emails to Walter about
>>> them, and just barely got started.
>>
>> Please share your thoughts here if you can too.
>
> Gladly; I dug my email and let me share a couple of excerpts.
<snipped excerpts>
Wow, I understand it now. I only hope that at least 'length' will be
deprecated before 1.0.
I like your dollars. I'm not so good with grammars, will your proposal
also work for user defined types?

Lutger wrote:
> Wow, I understand it now. I only hope that at least 'length' will be
> deprecated before 1.0.
>
> I like your dollars.
Well, just don't take'em away from my bank account :o).
> I'm not so good with grammars, will your proposal
> also work for user defined types?
The plan is that $expression is rewritten into (expression).length. The
consistent thing to do is to make that into an onXyz() function, but I
don't find this name inconsistency jarring.
Andrei

Bill Baxter wrote:
> After trying to write a multi-dimensional array class, my opinion is
> that D slice support could use some upgrades overall.
I'd be very interested in looking at what you've come up with. With my
own implementation of a multi-dimensional array type a couple of months
ago, I came to the same conclusion. I posted about it in:
news://news.digitalmars.com:119/edrv0n$hth$1@digitaldaemon.comhttp://www.digitalmars.com/d/archives/digitalmars/D/announce/4717.html> What I'd like to see:
>
> --MultiRange Slice--
> * A way to have multiple ranges in a slice, and a mix slice of and
> non-slice indices:
> A[i..j, k..m]
> A[i..j, p, k..m]
(snip)
> A[0..$,3..$]
Yes, I would too. It is quite frustrating having the syntax in the
language but not being allowed to utilize it... :)
I work around this by instead using a custom slice syntax instead:
A[range(i,j), range(k,m)]
A[range(i,j), p, range(k,m)]
A[range(0,end), range(3..end)]
A[end-1, p % end]
Basicly, the transformation is:
$ => end
a..b => range(a,b)
I briefly described this in:
news://news.digitalmars.com:119/eft9id$2aq3$1@digitaldaemon.com
The resulting code becomes quite optimal without the need for a position
dependent opLength type of operator, but handling all the cases puts a
larger burden on the implementor of opIndex.
> The problem is that opSlice has to look like opSlice(T1 lo, T2 hi) right
> now -- just two parameters (or zero).
[snip]
> Another solution is a built-in slice type. Ranges like a..b would get
> converted to slice instances automatically.
Yes, this would be my suggestion too. Adding an opApply to one such
built in range type would also have the nice side effect of allowing the
syntactical sugar:
foreach(i; 5..10)
> --User Definable '$'--
[snip]
> One solution - make an opLength that gets called with the parameter
> number in which the $ appears.
Yes, that is probably the cleanest solution. And if no such
opLength(int) overload exists, return the result of opLength() (or
possibly .length)
/Oskar

Andrei Alexandrescu (See Website For Email) wrote:
>
> A simpler grammar would have been to simply allow:
>
> UnaryExpression:
> PostfixExpression
> & UnaryExpression
> ... etc. etc. ...
> $ PostfixExpression
>
> But this would have been ambiguous. If the compiler sees "$-1", then the
> bad grammar says that's a unary use of $ because -1 is a
> PostfixExpression. But that's not what we wanted! We wanted $ to be
> nullary. That's why I needed to put all the cases in UnaryExpression.
>
Nice post, and one heck of an argument!
FWIW, I advocated something similar during the last round of debates
before the '$' operator was introduced. What I wanted to see was '$' to
become like 'this' within slice and array expressions, so that the
issues regarding 'length' could be resolved. In essence one could
simply say '$.length' and mean 'the length of the current array':
b[0 .. $.length];
a[0 .. $.getIndexOf(';')];
So in essence, every use of '$' would be a 'nullary' operator - an alias
if you will.
I'd imagine that extending things in this manner would simplify things
grammatically while allowing for a wider category of uses. However, it
doesn't solve the issue that you brought up, and that I've quoted above.
c[$-1];
It looks like it should be an implicit cast of the '$' to a size_t
(length), via it's use in an expression. Any thoughts on this?
--
- EricAnderton at yahoo