[-- Attachment #1: Type: text/plain, Size: 1611 bytes --]
Another debate at the time was caused by a disagreement between pcc and cc
regarding enums: are they a type or just a way to declare constant? I
remember getting annoyed by pcc not letting me declare a constant with an
enum and use it as an int. I protested to scj and dmr and after some to-ing
and fro-ing Steve changed pcc to treat them as constants.
Not sure it was the right decision, but C desperately wanted a non-macro
way to define a constant. I'd probably argue the same way today. The real
lesson is how propinquity affects progress.
-rbo
On Sat, Apr 25, 2020 at 12:51 PM Rob Pike <robpike@gmail.com> wrote:
> The ability to call a function pointer fp with the syntax fp() rather than
> (*fp)() came rather late, I think at Bjarne's suggestion or example. Pretty
> sure it was not in v7 C, as you observe.
>
> Convenient though the shorthand may be, it always bothered me as
> inconsistent and misleading. (I am pretty sure I used it sometimes
> regardless.)
>
> -rob
>
>
> On Sat, Apr 25, 2020 at 12:48 PM Adam Thornton <athornton@gmail.com>
> wrote:
>
>>
>>
>> On Apr 24, 2020, at 7:37 PM, Charles Anthony <charles.unix.pro@gmail.com>
>> wrote:
>>
>>
>>
>> On Fri, Apr 24, 2020 at 7:00 PM Adam Thornton <athornton@gmail.com>
>> wrote:
>>
>>> This doesn’t like the function pointer.
>>>
>>
>>> $ cc -c choparg.c
>>> choparg.c:11: Call of non-function
>>>
>>> Perhaps:
>>
>> (*fcn)(arg);
>>
>>
>> We have a winner!
>>
>> Also, Kartik, dunno where it is on the net, but if you install a v7
>> system, /usr/src/cmd/c
>>
>> Adam
>>
>>
[-- Attachment #2: Type: text/html, Size: 3045 bytes --]

[-- Attachment #1: Type: text/plain, Size: 898 bytes --]
On Sat, 25 Apr 2020, Rob Pike wrote:
> The ability to call a function pointer fp with the syntax fp() rather
> than (*fp)() came rather late, I think at Bjarne's suggestion or
> example. Pretty sure it was not in v7 C, as you observe.
I have never seen that syntax used (and I've been tooling around with Unix
for decades). The variable "fp" in an argument list is a pointer to the
function, not the function itself, so dereference it.
I wouldn't put it past Stroustrup to have it in C++ though, as it pretty
much has everything else in it.
> Convenient though the shorthand may be, it always bothered me as
> inconsistent and misleading. (I am pretty sure I used it sometimes
> regardless.)
Indeed... My principle is to write code as though the next person to
maintain it is a psychopathic axe-murderer who knows where you live (or
perhaps even yourself, a year later)...
-- Dave

> From: Rob Pike
> Convenient though the shorthand may be, it always bothered me as
> inconsistent and misleading.
As someone who made very extensive use of procedure pointers (most notably in
upcalls, which never caught on, alas), I couldn't agree more.
Two very different things are happenging, but with the shorthand notation,
they share an identical representation. And for what? To save three characters?
Noel

On Sa, 2020-04-25 at 09:11 -0400, Noel Chiappa wrote:
> > From: Rob Pike
>
> > Convenient though the shorthand may be, it always bothered me as
> > inconsistent and misleading.
>
> As someone who made very extensive use of procedure pointers (most notably in
> upcalls, which never caught on, alas), I couldn't agree more.
>
> Two very different things are happenging, but with the shorthand notation,
> they share an identical representation. And for what? To save three characters?
The subject can be looked at from another angle. Consider
the call f(42). This might be read as first naming f (and
thus constructing a pointer to f) and then calling the
function which the pointer is pointing to. So at least
it should be possible to write the call as (*f)(42), which
indeed is equivalent to f(42). So it can be argued that
this notational shorthand should be allowed with all
function pointers.
Hellwig

> From: Rob Pike
> To make chaining of calls simpler. Write
> f()->g()->h()->i()
> the other way
You mean:
(*f)((*g)((*h)((*i)())))
I dunno, it doesn't seem that much worse to me.
What I like about the explicit notation (i.e. (*f) ()) is that it forces the
programmer to recognize what's going on.
On the other hand, I guess, the whole concept of compiled languages is to get
the programmer's nose out of the low-level details, so they can focus on the
high level. So I guess one could see allowing f() in place of (*f)() as an
instance of that.
Then again, down that road you find a lot of modern code, where a programmer
writes something that is e.g. horribly inefficient and slow, precisely because
they are so divorced from the low-level of what the code they wrote turns into...
Still, I'd be a little worried about a program doing (*f)((*g)((*h)((*i)()))),
no matter what the notation was; it would be awfully hard to recognize what
all the possible call chains are. But then again I guess a lot of e.g. AI code
does things like that...
Noel

[-- Attachment #1: Type: text/plain, Size: 842 bytes --]
On Saturday, April 25, 2020, 09:52:45 AM EDT, Hellwig Geisse <hellwig.geisse@mni.thm.de> wrote:
> On Sa, 2020-04-25 at 09:11 -0400, Noel Chiappa wrote:
> > Two very different things are happenging, but with the shorthand notation,
> > they share an identical representation. And for what? To save three characters?
>
> The subject can be looked at from another angle. Consider
> the call f(42). This might be read as first naming f (and
> thus constructing a pointer to f) and then calling the
> function which the pointer is pointing to.
This is the way that I've taken to looking at it for the
last 10 years or so. In fact, I see it as the same thing
as an array. Specifically, I've taken to thinking of []
as a postfix indexing operator and () as a postfix
calling operator, and the thing on the left is a pointer
in both cases.
BLS
[-- Attachment #2: Type: text/html, Size: 990 bytes --]

On 25 Apr 2020 19:01 +0000, from blstuart@bellsouth.net (Brian L. Stuart):
> On Saturday, April 25, 2020, 09:52:45 AM EDT, Hellwig Geisse <hellwig.geisse@mni.thm.de> wrote:
>> The subject can be looked at from another angle. Consider
>> the call f(42). This might be read as first naming f (and
>> thus constructing a pointer to f) and then calling the
>> function which the pointer is pointing to.
>
> This is the way that I've taken to looking at it for the
> last 10 years or so. In fact, I see it as the same thing
> as an array. Specifically, I've taken to thinking of []
> as a postfix indexing operator and () as a postfix
> calling operator, and the thing on the left is a pointer
> in both cases.
That's an interesting way of looking at it.
I was thinking: couldn't we apply the same kind of reasoning to
variables as well?
Bear with me for a second.
If we have
int z = 123;
then "z" is a mnenomic way of referring to an int-sized memory
location, which after initialization holds the value 123. In C, we can
take the address of any variable stored in memory, and we can
dereference any address into memory. (How _meaningful_ especially the
latter is varies, particularly on memory-protected architectures, but
it's still possible.)
So, is there any material difference between
printf("%d", z);
and
printf("%d", *(&z));
If there is, then certainly GCC isn't indicating that it's there. Both
print 123, and both variants compile cleanly even with -Wall -pedantic.
OpenBSD clang 8.0.1 cc also gives identical output for both variants.
So if "z" and "*(&z)" (take the address of z, then dereference that
address, then use the value stored there) are equivalent, then in the
name of consistency, why _shouldn't_ f() and (*f)() (dereference the
address of f, then call) also be equivalent? After all, what is "f"
here, other than a mnenomic name for a memory location?
--
Michael Kjörling • https://michael.kjorling.se • michael@kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”

On 25 Apr 2020 14:03 -0400, from jnc@mercury.lcs.mit.edu (Noel Chiappa):
> Then again, down that road you find a lot of modern code, where a programmer
> writes something that is e.g. horribly inefficient and slow, precisely because
> they are so divorced from the low-level of what the code they wrote turns into...
...and then there's an exceptionally complicated CPU execution
pipeline in which code is rearranged to try to allow the CPU to
execute it as fast as possible while preserving "observable" behavior.
As we know, down that road lies... security vulnerabilities.
That said, I agree; I don't know how many times I've nearly headdesked
coming across code that looks like someone typed the first thing that
entered their mind, instead of actually thinking the problem through
first and _then_ coding a solution. I'm almost certainly not innocent
there myself, either, although I do try.
--
Michael Kjörling • https://michael.kjorling.se • michael@kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”

Noel Chiappa wrote in
<20200425194102.3A54318C0BE@mercury.lcs.mit.edu>:
|> From: moanga
|
|>> To make chaining of calls simpler. Write
|>> f()->g()->h()->i()
|
|Ah; I was confused by his notation; I didn't realize he meant the C \
|operator
|'->'.
Oh, i love this method-on-object as opposed to object-on-function
methodology.
while(_tr.read_line(*&_line) >= 0){
ln = _line.trim().squeeze().data();
len = _line.length();
which can easily exceed a 80x2[45] screen in C. Especially with
"more modern" frameworks which try to avoid namespace pollution
and easily exceed 20 bytes for a single function name alone.
Of course error handling is often a problem unless you go for
exceptions (terrible, especially if they do not have language
builtin support for file and line number, at least without
-DNDEBUG, imho), general state machines or whatever.
While at C++, the checked automatic upcasts are also very helpful,
especially if you have a deeper object hierarchy. (As in, struct
object{}, struct drawable{struct object super...}, struct
button{struct drawable super..}, then "drawable d;.. object*=&d
(or for heaven's sake &d.super) is much much better as
&d.super.super or even (object*)&d.)
Damn i have given up on perfection, and sometimes even on
"being explicit is better", but a shame it is.
Ciao, a nice Sunday and Good luck! from Germany,
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)

[-- Attachment #1: Type: text/plain, Size: 750 bytes --]
On Saturday, April 25, 2020, 04:11:58 PM EDT, Michael Kjörling <michael@kjorling.se> wrote:
> That said, I agree; I don't know how many times I've nearly headdesked
> coming across code that looks like someone typed the first thing that
> entered their mind, instead of actually thinking the problem through
> first and _then_ coding a solution. I'm almost certainly not innocent
> there myself, either, although I do try.
I know that feeling all too well. I try to think of it in
the same terms as "Let him who is without sin cast
the first stone." But students coming to me with code
that is clearly created using the random walk method
of programming lead to me not always being as patient
with my counsel as I should be.
BLS
[-- Attachment #2: Type: text/html, Size: 873 bytes --]

[-- Attachment #1: Type: text/plain, Size: 1148 bytes --]
On Saturday, April 25, 2020, 04:17:14 PM EDT, Michael Kjörling <michael@kjorling.se> wrote:
> I was thinking: couldn't we apply the same kind of reasoning to
> variables as well?
> ...
In short, yes. In the language Bliss, all identifiers
stood for the address of that thing. A prefix dot (.)
dereferences that thing. So copying x to y would be
something like
y = .x;
In C, rvalues have an implicit dereference happening.
I've actually created a toy language that I subject
my students to that revives the Bliss view to drive
home in their minds the difference between the
address of a memory location and the contents
of a memory location. I want them to have some
concept of how the program connects to the machine
before they find themselves so mired in abstraction
that everything is treated as magic.
One of my TAs in that class last fall was taking a
class in the winter where she was using C seriously
for the first time and having trouble understanding
pointers. When I explained how C pointers worked
in terms of the variables and dots of this other
language it became much more clear for her.
BLS
[-- Attachment #2: Type: text/html, Size: 1285 bytes --]

"Brian L. Stuart" <blstuart@bellsouth.net> wrote:
> On Saturday, April 25, 2020, 09:52:45 AM EDT, Hellwig Geisse <hellwig.geisse@mni.thm.de> wrote:
> > On Sa, 2020-04-25 at 09:11 -0400, Noel Chiappa wrote:
> > > Two very different things are happenging, but with the shorthand notation,
> > > they share an identical representation. And for what? To save three characters?
> >
> > The subject can be looked at from another angle. Consider
> > the call f(42). This might be read as first naming f (and
> > thus constructing a pointer to f) and then calling the
> > function which the pointer is pointing to.
>
> This is the way that I've taken to looking at it for the
> last 10 years or so. In fact, I see it as the same thing
> as an array. Specifically, I've taken to thinking of []
> as a postfix indexing operator and () as a postfix
> calling operator, and the thing on the left is a pointer
> in both cases.
>
> BLS
>
Algol 68 had a concept "deproceduring" similar to "dereferencing". If you
think of
foo(arg)
where plain "foo" is a pointer to a function and adding the parentheses
does the call, then it's the same with a procedure name or with
a function pointer.
This is pretty much what BLS said. Thinking of [] and () as operators
is explicit in C++ (for good and for ill).
Arnold

On Sat, Apr 25, 2020 at 02:03:57PM -0400, Noel Chiappa wrote:
> > From: Rob Pike
>
> > To make chaining of calls simpler. Write
> > f()->g()->h()->i()
> > the other way
>
> You mean:
>
> (*f)((*g)((*h)((*i)())))
>
> I dunno, it doesn't seem that much worse to me.
No, I think he means something like:
(*((*((*((*f)()->g))()->h))()->i))()
but I can't recall the relative priority of '*' and '->' in
the above, so I may have added unnecessary parens.
Or was he thinking of having to use '.' as well to access
the member pointers within the structs?
DF

On Sun, Apr 26, 2020 at 08:37:04PM +0100, Derek Fawcus wrote:
> No, I think he means something like:
>
> (*((*((*((*f)()->g))()->h))()->i))()
>
> but I can't recall the relative priority of '*' and '->' in
> the above, so I may have added unnecessary parens.
Actually trying it, while the above does the right thing,
I can also get the following to compile with a modern compiler
(*(*(*(*f)()->g)()->h)()->i)();
So maybe that was the answer?
I guess I'd have to question why someone would wish to write
such a construct, as error handling seems awkward. Even in
the modern form.
DF

> On Apr 26, 2020, at 13:10, Derek Fawcus wrote:
>
> I guess I'd have to question why someone would wish to write such a construct,
> as error handling seems awkward.
FWIW, I do most of my programming these days in Elixir. It's a functional
programming language with pervasive pattern matching, Rubyish syntax, and
Lispish macros. It runs on the Erlang virtual machine, so it has a good
story for Actor-based concurrency, distribution, etc. For details, see:
https://en.wikipedia.org/wiki/Elixir_(programming_language)
Anyway, compilation is mostly handled by Lispish macros, so it can support
some fairly cool metaprogramming. In particular, I can write things like:
out_map = inp_list |>
Enum.filter(filter_fn) |>
Enum.map(map_fn) |>
Enum.reduce(%{}, reduce_fn)
Piped values are handed in as the first argument to each function and most
functions expect this behavior. For extra credit, there is a set of Stream
functions (really, macros) that process one element at a time and handle
errors in a reasonable manner.
-r

On 04/26/20 16:10, Derek Fawcus wrote:
> On Sun, Apr 26, 2020 at 08:37:04PM +0100, Derek Fawcus wrote:
>> No, I think he means something like:
>>
>> (*((*((*((*f)()->g))()->h))()->i))()
>>
>> but I can't recall the relative priority of '*' and '->' in
>> the above, so I may have added unnecessary parens.
> Actually trying it, while the above does the right thing,
> I can also get the following to compile with a modern compiler
>
> (*(*(*(*f)()->g)()->h)()->i)();
>
> So maybe that was the answer?
K&R 1, Sect. 6.2. (with no mention of Rob Pike's influence).
N.
>
> I guess I'd have to question why someone would wish to write
> such a construct, as error handling seems awkward. Even in
> the modern form.
>
> DF

Rob Pike <robpike@gmail.com> wrote:
> The ability to call a function pointer fp with the syntax fp() rather than
> (*fp)() came rather late, I think at Bjarne's suggestion or example. Pretty
> sure it was not in v7 C, as you observe.
I've seen some interesting discussion about Dave Horsfall's favourite
retro-C definition of abort():
int abort 4;
...
abort();
https://minnie.tuhs.org/pipermail/tuhs/2020-March/020680.html
In particular a lot of people didn't know that function pointers could not
be called like abort() so they didn't realise that 4 was the machine code
contents of the function, not the address of the function. (Extra
confusing since branching to address 4 was also a plausible way to crash
the program...)
But that made me wonder what 7th-and-earlier C would do if you tried to
call a local variable. I guess that would lead to the compiler saying
error("Call of non-function");
Tony.
--
f.anthony.n.finch <dot@dotat.at> http://dotat.at/
Hebrides, Bailey, Fair Isle, Faeroes: Northeasterly 4 to 6, occasionally 7 at
first in north Fair Isle. Moderate or rough. Showers. Good, occasionally
moderate.

> From: Derek Fawcus
> I think he means something like:
> (*((*((*((*f)()->g))()->h))()->i))()
So I've been confused by this thread, and I'm hoping someone can deconfuse me
- but I think I may have figured it out.
What's confusing me is that in C, the -> operator is followed by "an
identifier [which] designates a member of a structure or union object" (I
checked the spec to make sure my memory hadn't dropped any bits) - but g, h
above are arguments; so I couldn't figure out what was going on.
I think what may have happened is that initially the discussion was about C
("Pretty sure it was not in v7 C"), but then it switched to C++ - with which
I'm not familiar, hence my confusion - without explicitly indicating that
change (although the reference to Bjarne Stroustrup should been a clue). (And
that's why I thought "f()->g()->h()->i()" was ad hoc notation for "calls f(),
then calls g()".)
Am I tracking now?
Noel

g, h and i are members of structures, the pointer of which is returned by the preceding function call. They have to be defined as pointers to functions returning a pointer to the following structure.
A simple example is:
typedef struct Node Node;
struct Node
{
Node *(*f)(void);
};
void
main(void)
{
Node *p;
p->f()->f()->f();
call();
(*((*((*p->f)()->f))())->f)();
}
// (*((*((*((*f)()->g))()->h))()->i))()
> On Apr 27, 2020, at 1:45 PM, Noel Chiappa <jnc@mercury.lcs.mit.edu> wrote:
>
>> From: Derek Fawcus
>
>> I think he means something like:
>> (*((*((*((*f)()->g))()->h))()->i))()
>
> So I've been confused by this thread, and I'm hoping someone can deconfuse me
> - but I think I may have figured it out.
>
> What's confusing me is that in C, the -> operator is followed by "an
> identifier [which] designates a member of a structure or union object" (I
> checked the spec to make sure my memory hadn't dropped any bits) - but g, h
> above are arguments; so I couldn't figure out what was going on.
>
> I think what may have happened is that initially the discussion was about C
> ("Pretty sure it was not in v7 C"), but then it switched to C++ - with which
> I'm not familiar, hence my confusion - without explicitly indicating that
> change (although the reference to Bjarne Stroustrup should been a clue). (And
> that's why I thought "f()->g()->h()->i()" was ad hoc notation for "calls f(),
> then calls g()".)
>
> Am I tracking now?
>
> Noel
>
>

[-- Attachment #1: Type: text/plain, Size: 2760 bytes --]
Following up on Rob's comment, I always took the point of view that
Dennis owned the C description, and what he said goes. Not that I
didn't make suggestions that he accepted. One of the better ones
(actually in B) was ^ for exclusive OR. One of the worse ones was the
syntax for casts. We looked at about 5 different ideas and hated all of
them. And most of them couldn't be easily compiled with Yacc. So I
took the grammar for declarations, removed the variable name, and voila,
it expressed everything we wanted in the way of semantics, had a simple
rule of construction, and we badly needed the functionality for the
Interdata port. I quickly came to hate it, though -- the casts we were
using looked like a teletype threw up in the middle of the code.
With respect to enums, there is a feature I've wanted for years: a typed
typedef. Saying typetdef int foo would make foo an integer, but if you
passed an ordinary int to something declared as foo it would be an
error. Even if it was an integer constant unless cast.
The amount of mechanism required to get that behavior from both C and
C++ is horrible, so far as I know, although C++ has accreted so much
stuff maybe it's there now...
Steve
---
On 2020-04-24 19:54, Rob Pike wrote:
> Another debate at the time was caused by a disagreement between pcc and cc regarding enums: are they a type or just a way to declare constant? I remember getting annoyed by pcc not letting me declare a constant with an enum and use it as an int. I protested to scj and dmr and after some to-ing and fro-ing Steve changed pcc to treat them as constants.
>
> Not sure it was the right decision, but C desperately wanted a non-macro way to define a constant. I'd probably argue the same way today. The real lesson is how propinquity affects progress.
>
> -rbo
>
> On Sat, Apr 25, 2020 at 12:51 PM Rob Pike <robpike@gmail.com> wrote:
> The ability to call a function pointer fp with the syntax fp() rather than (*fp)() came rather late, I think at Bjarne's suggestion or example. Pretty sure it was not in v7 C, as you observe.
>
> Convenient though the shorthand may be, it always bothered me as inconsistent and misleading. (I am pretty sure I used it sometimes regardless.)
>
> -rob
>
> On Sat, Apr 25, 2020 at 12:48 PM Adam Thornton <athornton@gmail.com> wrote:
>
> On Apr 24, 2020, at 7:37 PM, Charles Anthony <charles.unix.pro@gmail.com> wrote:
>
> On Fri, Apr 24, 2020 at 7:00 PM Adam Thornton <athornton@gmail.com> wrote:
> This doesn't like the function pointer.
>
> $ cc -c choparg.c
> choparg.c:11: Call of non-function
>
> Perhaps:
>
> (*fcn)(arg);
We have a winner!
Also, Kartik, dunno where it is on the net, but if you install a v7
system, /usr/src/cmd/c
Adam
[-- Attachment #2: Type: text/html, Size: 5167 bytes --]

[-- Attachment #1: Type: text/plain, Size: 3180 bytes --]
Interesting that Go had only what you call "typed typdefs" until we needed
to add "untyped typedefs" so we could provide aliasing for forwarding
declarations. And that necessity made me unhappy. But the short version: Go
went the other way with what "typedef" means.
-rob
On Mon, May 11, 2020 at 10:28 AM <scj@yaccman.com> wrote:
> Following up on Rob's comment, I always took the point of view that Dennis
> owned the C description, and what he said goes. Not that I didn't make
> suggestions that he accepted. One of the better ones (actually in B) was ^
> for exclusive OR. One of the worse ones was the syntax for casts. We
> looked at about 5 different ideas and hated all of them. And most of them
> couldn't be easily compiled with Yacc. So I took the grammar for
> declarations, removed the variable name, and voila, it expressed everything
> we wanted in the way of semantics, had a simple rule of construction, and
> we badly needed the functionality for the Interdata port. I quickly came
> to hate it, though -- the casts we were using looked like a teletype threw
> up in the middle of the code.
>
> With respect to enums, there is a feature I've wanted for years: a typed
> typedef. Saying typetdef int foo would make foo an integer, but if you
> passed an ordinary int to something declared as foo it would be an error.
> Even if it was an integer constant unless cast.
>
> The amount of mechanism required to get that behavior from both C and C++
> is horrible, so far as I know, although C++ has accreted so much stuff
> maybe it's there now...
>
> Steve
> ---
>
>
>
> On 2020-04-24 19:54, Rob Pike wrote:
>
> Another debate at the time was caused by a disagreement between pcc and cc
> regarding enums: are they a type or just a way to declare constant? I
> remember getting annoyed by pcc not letting me declare a constant with an
> enum and use it as an int. I protested to scj and dmr and after some to-ing
> and fro-ing Steve changed pcc to treat them as constants.
>
> Not sure it was the right decision, but C desperately wanted a non-macro
> way to define a constant. I'd probably argue the same way today. The real
> lesson is how propinquity affects progress.
>
> -rbo
>
>
> On Sat, Apr 25, 2020 at 12:51 PM Rob Pike <robpike@gmail.com> wrote:
>
> The ability to call a function pointer fp with the syntax fp() rather than
> (*fp)() came rather late, I think at Bjarne's suggestion or example. Pretty
> sure it was not in v7 C, as you observe.
>
> Convenient though the shorthand may be, it always bothered me as
> inconsistent and misleading. (I am pretty sure I used it sometimes
> regardless.)
>
> -rob
>
>
> On Sat, Apr 25, 2020 at 12:48 PM Adam Thornton <athornton@gmail.com>
> wrote:
>
>
>
> On Apr 24, 2020, at 7:37 PM, Charles Anthony <charles.unix.pro@gmail.com>
> wrote:
>
>
>
> On Fri, Apr 24, 2020 at 7:00 PM Adam Thornton <athornton@gmail.com> wrote:
>
> This doesn't like the function pointer.
>
>
> $ cc -c choparg.c
> choparg.c:11: Call of non-function
>
>
> Perhaps:
>
> (*fcn)(arg);
>
>
> We have a winner!
>
> Also, Kartik, dunno where it is on the net, but if you install a v7
> system, /usr/src/cmd/c
>
> Adam
>
>
[-- Attachment #2: Type: text/html, Size: 5575 bytes --]

My mail is screwed up, I see Rob's reply to Steve but didn't see Steve's
original.
> On Mon, May 11, 2020 at 10:28 AM <scj@yaccman.com> wrote:
> > With respect to enums, there is a feature I've wanted for years: a typed
> > typedef. Saying typetdef int foo would make foo an integer, but if you
> > passed an ordinary int to something declared as foo it would be an error.
> > Even if it was an integer constant unless cast.
Steve, I couldn't agree more, you are 100% right, this is how it should
work. I wanted to like enums because I naively thought they'd have these
semantics but then learned they really aren't any different than a well
managed list of #defines.
IMHO, without your semantics, enums are pretty useless, #define is good
enough and more clear.
--lm

[-- Attachment #1: Type: text/plain, Size: 3802 bytes --]
If I remember correctly, enums in Mesa (the PARC Pascal like system language) had typed enums.
The 1979 version of the language manual at http://www.bitsavers.org/pdf/xerox/parc/techReports/CSL-79-3_Mesa_Language_Manual_Version_5.0.pdf <http://www.bitsavers.org/pdf/xerox/parc/techReports/CSL-79-3_Mesa_Language_Manual_Version_5.0.pdf>
says so anyway.
-L
PS The niftiest use of #define I know about was at the short lived supercomputer company SiCortex around 2005. Wilson Snyder (verilator fame) wrote a thing that extracted all the constants and register definitions from the CPU chip spec and output them as #define equivalents in 5 different languages.
PPS Thank you for ‘^'
> On 2020, May 10, at 8:28 PM, scj@yaccman.com wrote:
>
> Following up on Rob's comment, I always took the point of view that Dennis owned the C description, and what he said goes. Not that I didn't make suggestions that he accepted. One of the better ones (actually in B) was ^ for exclusive OR. One of the worse ones was the syntax for casts. We looked at about 5 different ideas and hated all of them. And most of them couldn't be easily compiled with Yacc. So I took the grammar for declarations, removed the variable name, and voila, it expressed everything we wanted in the way of semantics, had a simple rule of construction, and we badly needed the functionality for the Interdata port. I quickly came to hate it, though -- the casts we were using looked like a teletype threw up in the middle of the code.
>
> With respect to enums, there is a feature I've wanted for years: a typed typedef. Saying typetdef int foo would make foo an integer, but if you passed an ordinary int to something declared as foo it would be an error. Even if it was an integer constant unless cast.
>
> The amount of mechanism required to get that behavior from both C and C++ is horrible, so far as I know, although C++ has accreted so much stuff maybe it's there now...
>
> Steve
>
> ---
>
>
>
> On 2020-04-24 19:54, Rob Pike wrote:
>
>> Another debate at the time was caused by a disagreement between pcc and cc regarding enums: are they a type or just a way to declare constant? I remember getting annoyed by pcc not letting me declare a constant with an enum and use it as an int. I protested to scj and dmr and after some to-ing and fro-ing Steve changed pcc to treat them as constants.
>>
>> Not sure it was the right decision, but C desperately wanted a non-macro way to define a constant. I'd probably argue the same way today. The real lesson is how propinquity affects progress.
>>
>> -rbo
>>
>>
>> On Sat, Apr 25, 2020 at 12:51 PM Rob Pike <robpike@gmail.com <mailto:robpike@gmail.com>> wrote:
>> The ability to call a function pointer fp with the syntax fp() rather than (*fp)() came rather late, I think at Bjarne's suggestion or example. Pretty sure it was not in v7 C, as you observe.
>>
>> Convenient though the shorthand may be, it always bothered me as inconsistent and misleading. (I am pretty sure I used it sometimes regardless.)
>>
>> -rob
>>
>>
>> On Sat, Apr 25, 2020 at 12:48 PM Adam Thornton <athornton@gmail.com <mailto:athornton@gmail.com>> wrote:
>>
>>
>>> On Apr 24, 2020, at 7:37 PM, Charles Anthony <charles.unix.pro@gmail.com <mailto:charles.unix.pro@gmail.com>> wrote:
>>>
>>>
>>>
>>> On Fri, Apr 24, 2020 at 7:00 PM Adam Thornton <athornton@gmail.com <mailto:athornton@gmail.com>> wrote:
>>> This doesn't like the function pointer.
>>>
>>> $ cc -c choparg.c
>>> choparg.c:11: Call of non-function
>>>
>>> Perhaps:
>>>
>>> (*fcn)(arg);
>>>
>>
>> We have a winner!
>>
>> Also, Kartik, dunno where it is on the net, but if you install a v7 system, /usr/src/cmd/c
>>
>> Adam
[-- Attachment #2: Type: text/html, Size: 7007 bytes --]

On 10 May 2020 17:28 -0700, from scj@yaccman.com:
> With respect to enums, there is a feature I've wanted for years: a typed
> typedef. Saying typetdef int foo would make foo an integer, but if you
> passed an ordinary int to something declared as foo it would be an
> error. Even if it was an integer constant unless cast.
Isn't that at least pretty close to how Ada does it?
--
Michael Kjörling • https://michael.kjorling.se • michael@kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”

[-- Attachment #1: Type: text/plain, Size: 1971 bytes --]
At Sun, 10 May 2020 17:57:46 -0700, Larry McVoy <lm@mcvoy.com> wrote:
Subject: Re: [TUHS] v7 K&R C
>
> > On Mon, May 11, 2020 at 10:28 AM <scj@yaccman.com> wrote:
> > > With respect to enums, there is a feature I've wanted for years: a typed
> > > typedef. Saying typetdef int foo would make foo an integer, but if you
> > > passed an ordinary int to something declared as foo it would be an error.
> > > Even if it was an integer constant unless cast.
>
> Steve, I couldn't agree more, you are 100% right, this is how it should
> work. I wanted to like enums because I naively thought they'd have these
> semantics but then learned they really aren't any different than a well
> managed list of #defines.
Absolutely agreed!
The lameness of typedef (and in how enum is related to typedef) is one
of the saddest parts of C. (The other is the default promotion to int.)
It would be trivial to fix too -- for a "new" C, that is. Making it
backward compatible for legacy code would be tough, even with tooling to
help fix the worst issues. I've seen far too much code that would be
hard to fix by hand, e.g. some that even goes so far as to assume things
like arithmetic on enum values will produce other valid enum values.
Ideally enums could be a value in any native type, including float/double.
> IMHO, without your semantics, enums are pretty useless, #define is good
> enough and more clear.
Actually that's no longer true with a good modern toolchain, especially
with respect to the debugger. A good debugger can now show the enum
symbol for a (matching) value of a properly typedefed variable.
(In fact I never thouth that a #define macro was more clear, even before
debugger support -- the debugger support just gave me a better excuse to
use to explain my preference!)
--
Greg A. Woods <gwoods@acm.org>
Kelowna, BC +1 250 762-7675 RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com> Avoncote Farms <woods@avoncote.ca>
[-- Attachment #2: OpenPGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

On 5/11/20, Greg A. Woods <woods@robohack.ca> wrote:
>
> The lameness of typedef (and in how enum is related to typedef) is one
> of the saddest parts of C. (The other is the default promotion to int.)
I would add a third: file-scope declarations being global by default.
One must use the keyword "static" to restrict a file-scope declaration
to the file it's declared in. And why "static"? All file-scope
declarations have static allocation. Why isn't the keyword "local" or
"own"? Anyway, the way it ought to be is that file-scope declarations
are restricted to the file they're declared in. To make the symbol
visible outside its file, you should have to explicitly say "global".
> It would be trivial to fix too -- for a "new" C, that is. Making it
> backward compatible for legacy code would be tough, even with tooling to
> help fix the worst issues. I've seen far too much code that would be
> hard to fix by hand, e.g. some that even goes so far as to assume things
> like arithmetic on enum values will produce other valid enum values.
This ought to be easy to fix using a compiler command line option for
the legacy behavior. Many C compilers do this already to support K&R
semantics vs. standard C semantics.
> Ideally enums could be a value in any native type, including float/double.
Except pointers, of course.
>> IMHO, without your semantics, enums are pretty useless, #define is good
>> enough and more clear.
>
> Actually that's no longer true with a good modern toolchain, especially
> with respect to the debugger. A good debugger can now show the enum
> symbol for a (matching) value of a properly typedefed variable.
Indeed.
-Paul W.

On Mon, May 11, 2020 at 02:25:15PM -0400, Paul Winalski wrote:
> On 5/11/20, Greg A. Woods <woods@robohack.ca> wrote:
> >
> > The lameness of typedef (and in how enum is related to typedef) is one
> > of the saddest parts of C. (The other is the default promotion to int.)
>
> I would add a third: file-scope declarations being global by default.
> One must use the keyword "static" to restrict a file-scope declaration
> to the file it's declared in. And why "static"? All file-scope
I never cared for "static" either, seemed weird. All my code is
#define private static
private int
super_duper(void)
{
...
}
and everyone knows what that means at a glance.
> declarations have static allocation. Why isn't the keyword "local" or
> "own"? Anyway, the way it ought to be is that file-scope declarations
> are restricted to the file they're declared in. To make the symbol
> visible outside its file, you should have to explicitly say "global".
>
> > It would be trivial to fix too -- for a "new" C, that is. Making it
> > backward compatible for legacy code would be tough, even with tooling to
> > help fix the worst issues. I've seen far too much code that would be
> > hard to fix by hand, e.g. some that even goes so far as to assume things
> > like arithmetic on enum values will produce other valid enum values.
>
> This ought to be easy to fix using a compiler command line option for
> the legacy behavior. Many C compilers do this already to support K&R
> semantics vs. standard C semantics.
>
> > Ideally enums could be a value in any native type, including float/double.
>
> Except pointers, of course.
>
> >> IMHO, without your semantics, enums are pretty useless, #define is good
> >> enough and more clear.
> >
> > Actually that's no longer true with a good modern toolchain, especially
> > with respect to the debugger. A good debugger can now show the enum
> > symbol for a (matching) value of a properly typedefed variable.
>
> Indeed.
>
> -Paul W.
--
---
Larry McVoy lm at mcvoy.com http://www.mcvoy.com/lm

On 5/11/20, Clem Cole <clemc@ccc.com> wrote:
>
> C++ is an example in my mind of not listening to Dennis' words:
>
> - “C is quirky, flawed, and an enormous success.”
Ditto Fortran.
> - “When I read commentary about suggestions for where C should go, I
> often think back and give thanks that it wasn't developed under the
> advice
> of a worldwide crowd.”
The old saying of an elephant being a mouse designed by committee comes to mind.
Language standards committees tend to be like a pack of dogs
contemplating a tree. Each dog isn't satisfied with the tree until
he's peed on it.
> - “A language that doesn't have everything is actually easier to program
> in than some that do”
Big, comprehensive languages such as PL/I, Ada, and C++ tend to have
more of their share of toxic language features--things that shouldn't
be used if you want reliable, easily maintained and understood code.
Ada failed for two reasons: [1] it had cooties because of its
military origins, and [2] it collapsed under the weight of all of its
features.
-Paul W.

Maybe it’s time for C++ subset ‘G'
Joe McGuckin
ViaNet Communications
joe@via.net
650-207-0372 cell
650-213-1302 office
650-969-2124 fax
> On May 11, 2020, at 12:12 PM, Paul Winalski <paul.winalski@gmail.com> wrote:
>
> On 5/11/20, Clem Cole <clemc@ccc.com> wrote:
>>
>> C++ is an example in my mind of not listening to Dennis' words:
>>
>> - “C is quirky, flawed, and an enormous success.”
>
> Ditto Fortran.
>
>> - “When I read commentary about suggestions for where C should go, I
>> often think back and give thanks that it wasn't developed under the
>> advice
>> of a worldwide crowd.”
>
> The old saying of an elephant being a mouse designed by committee comes to mind.
>
> Language standards committees tend to be like a pack of dogs
> contemplating a tree. Each dog isn't satisfied with the tree until
> he's peed on it.
>
>> - “A language that doesn't have everything is actually easier to program
>> in than some that do”
>
> Big, comprehensive languages such as PL/I, Ada, and C++ tend to have
> more of their share of toxic language features--things that shouldn't
> be used if you want reliable, easily maintained and understood code.
> Ada failed for two reasons: [1] it had cooties because of its
> military origins, and [2] it collapsed under the weight of all of its
> features.
>
> -Paul W.

Isn't that effectively what companies do now? Don't they all have a
"Here is what you can use, this and nothing else" doc?
On Mon, May 11, 2020 at 12:57:01PM -0700, joe mcguckin wrote:
> Maybe it???s time for C++ subset ???G'
>
>
> Joe McGuckin
> ViaNet Communications
>
> joe@via.net
> 650-207-0372 cell
> 650-213-1302 office
> 650-969-2124 fax
>
>
>
> > On May 11, 2020, at 12:12 PM, Paul Winalski <paul.winalski@gmail.com> wrote:
> >
> > On 5/11/20, Clem Cole <clemc@ccc.com> wrote:
> >>
> >> C++ is an example in my mind of not listening to Dennis' words:
> >>
> >> - ???C is quirky, flawed, and an enormous success.???
> >
> > Ditto Fortran.
> >
> >> - ???When I read commentary about suggestions for where C should go, I
> >> often think back and give thanks that it wasn't developed under the
> >> advice
> >> of a worldwide crowd.???
> >
> > The old saying of an elephant being a mouse designed by committee comes to mind.
> >
> > Language standards committees tend to be like a pack of dogs
> > contemplating a tree. Each dog isn't satisfied with the tree until
> > he's peed on it.
> >
> >> - ???A language that doesn't have everything is actually easier to program
> >> in than some that do???
> >
> > Big, comprehensive languages such as PL/I, Ada, and C++ tend to have
> > more of their share of toxic language features--things that shouldn't
> > be used if you want reliable, easily maintained and understood code.
> > Ada failed for two reasons: [1] it had cooties because of its
> > military origins, and [2] it collapsed under the weight of all of its
> > features.
> >
> > -Paul W.
--
---
Larry McVoy lm at mcvoy.com http://www.mcvoy.com/lm

On 5/11/20, Larry McVoy <lm@mcvoy.com> wrote:
> Isn't that effectively what companies do now? Don't they all have a
> "Here is what you can use, this and nothing else" doc?
>
> On Mon, May 11, 2020 at 12:57:01PM -0700, joe mcguckin wrote:
>> Maybe it???s time for C++ subset ???G'
Absolutely. The projects that I ran effectively used C++ as a
stronger-typed version of C. A small subset of C++ features were
allowed, but among the prohibited features were:
o multiple inheritance
o operator overloading
o friend classes
o C++ exception handling
o all std:: and STL functions
The last two of these are mainly for performance reasons. throw and
catch play merry hell with compiler optimizations, especially of
global variables.
-Paul W.

> On 5/11/20, Larry McVoy <lm@mcvoy.com> wrote:
> o all std:: and STL functions
>
> The last two of these are mainly for performance reasons. throw and
> catch play merry hell with compiler optimizations, especially of
> global variables.
You'll have to explain to me how templates or the standard library (which
by the way includes all of the C stuff) affects performance. In fact, we
use templates to INCREASE rather than decrease performance. Templating
is almost entirely compile time rewrites.

Just a note, you seemed like you are replying to me (see below) but what
you quoted Paul wrote. I am most certainly NOT putting myself out there
as a C++ expert, I'm a C guy through and through.
On Tue, May 12, 2020 at 01:35:24PM -0400, ron@ronnatalie.com wrote:
> > On 5/11/20, Larry McVoy <lm@mcvoy.com> wrote:
>
> > o all std:: and STL functions
> >
> > The last two of these are mainly for performance reasons. throw and
> > catch play merry hell with compiler optimizations, especially of
> > global variables.
>
> You'll have to explain to me how templates or the standard library (which
> by the way includes all of the C stuff) affects performance. In fact, we
> use templates to INCREASE rather than decrease performance. Templating
> is almost entirely compile time rewrites.
--
---
Larry McVoy lm at mcvoy.com http://www.mcvoy.com/lm

On 5/12/20, ron@ronnatalie.com <ron@ronnatalie.com> wrote:
>> On 5/11/20, Larry McVoy <lm@mcvoy.com> wrote:
>
>> o all std:: and STL functions
>>
>> The last two of these are mainly for performance reasons. throw and
>> catch play merry hell with compiler optimizations, especially of
>> global variables.
>
> You'll have to explain to me how templates or the standard library (which
> by the way includes all of the C stuff) affects performance. In fact, we
> use templates to INCREASE rather than decrease performance. Templating
> is almost entirely compile time rewrites.
The C++ standard libraries make heavy use of throw/catch exception
handling. If routine A calls routine B, and B is known by the
compiler to have the capability to throw exceptions, a bunch of
important optimizations can't be done. For example:
o You can't keep global values in registers around the call to B
because the handler that catches an exception that B throws might use
that global variable. So you have to spill the value around the call.
o You can't do value propagation of global variables around the call
to B because a handler might change their values.
And it gets a lot worse when you start doing parallel loop execution.
I implemented a new design for exception handling in a C/C++ compiler
back end, and I found lots of corner cases where the C++ standard was
silent as to what should happen when exceptions are thrown or caught
from parallel threads. Things such as the order of execution of
constructors and destructors for parallel routines when a thrown
exception is unwound, and which side of the parallelization executes
constructors and destructors under those conditions. The committee
just plain never considered those issues.
-Paul W.

On Tue, 12 May 2020, Paul Winalski wrote:
> Absolutely. The projects that I ran effectively used C++ as a
> stronger-typed version of C. A small subset of C++ features were
> allowed, but among the prohibited features were:
[...]
> o operator overloading
[...]
I never could figure out why Stroustrup implemented that "feature"; let's
see, this operator usually means this, except when you use it in that
situation in which case it means something else. Now, try debugging that.
I had to learn C++ for a project at $WORK years ago (the client demanded
it), and boy was I glad when I left...
-- Dave

[-- Attachment #1: Type: text/plain, Size: 1124 bytes --]
I never liked call by reference. When I was trying to understand a chunk of
code, it was a great mental simplification to know that whatever a called
routine did, it couldn't have an effect on the code I was trying to
understand except through a returned value and (ghastly) global variables.
Operator overloading is far worse. Now I can't even be sure code I'm
looking at is doing what I thought it did.
On Wed, May 13, 2020 at 7:38 PM Dave Horsfall <dave@horsfall.org> wrote:
> On Tue, 12 May 2020, Paul Winalski wrote:
>
> > Absolutely. The projects that I ran effectively used C++ as a
> > stronger-typed version of C. A small subset of C++ features were
> > allowed, but among the prohibited features were:
>
> [...]
>
> > o operator overloading
>
> [...]
>
> I never could figure out why Stroustrup implemented that "feature"; let's
> see, this operator usually means this, except when you use it in that
> situation in which case it means something else. Now, try debugging that.
>
> I had to learn C++ for a project at $WORK years ago (the client demanded
> it), and boy was I glad when I left...
>
> -- Dave
>
[-- Attachment #2: Type: text/html, Size: 1572 bytes --]

> On May 13, 2020, at 17:42, John P. Linderman <jpl.jpl@gmail.com> wrote:
>
> I never liked call by reference. When I was trying to understand a chunk of code, it was a great mental simplification to know that whatever a called routine did, it couldn't have an effect on the code I was trying to understand except through a returned value and (ghastly) global variables. ...
A Fortran implementation I used years ago kept constants in a "literal pool". So, if you called a subroutine, passing in a constant, there was a possibility that the constant might be modified upon the routine's return. I don't recall this ever causing a problem in practice, but the possibility was amusing...
-r

[-- Attachment #1: Type: text/plain, Size: 756 bytes --]> On May 13, 2020, at 7:00 PM,Dave Horsfall <dave@horsfall.org> wrote:
>
> I never could figure out why Stroustrup implemented that "feature"; let's
> see, this operator usually means this, except when you use it in that
> situation in which case it means something else. Now, try debugging that.
C continues the tradition begun by Fortran and Algol 60 of overloading the arithmetic operators on the various numeric types. C++ allows new types to be defined; when a new type obeys the generally understood properties of a built-in type, it makes sense to use the same operator (or function) for the corresponding operation on the new type (e.g., addition on complex numbers, arbitrary-precision integers and rationals, polynomials, or matrices).
[-- Attachment #2: Type: text/html, Size: 3493 bytes --]

[-- Attachment #1: Type: text/plain, Size: 1021 bytes --]
On Wed, May 13, 2020 at 7:45 PM Rich Morin <rdm@cfcl.com> wrote:
> > On May 13, 2020, at 17:42, John P. Linderman <jpl.jpl@gmail.com> wrote:
> >
> > I never liked call by reference. When I was trying to understand a chunk
> of code, it was a great mental simplification to know that whatever a
> called routine did, it couldn't have an effect on the code I was trying to
> understand except through a returned value and (ghastly) global variables.
> ...
>
> A Fortran implementation I used years ago kept constants in a "literal
> pool". So, if you called a subroutine, passing in a constant, there was a
> possibility that the constant might be modified upon the routine's return.
> I don't recall this ever causing a problem in practice, but the possibility
> was amusing...
>
Ah yes. A long time ago, some one came to me with a mysteriously behaving
Pr1me FORTRAN program; after much head scratching, I found where they were
changing the value of "0".
-- Charles
>
> --
X-Clacks-Overhead: GNU Terry Pratchett
[-- Attachment #2: Type: text/html, Size: 1700 bytes --]

[-- Attachment #1: Type: text/plain, Size: 1151 bytes --]
At Thu, 14 May 2020 09:36:57 +1000 (EST), Dave Horsfall <dave@horsfall.org> wrote:
Subject: Re: [TUHS] v7 K&R C
>
> On Tue, 12 May 2020, Paul Winalski wrote:
>
> > o operator overloading
>
> [...]
>
> I never could figure out why Stroustrup implemented that "feature";
> let's see, this operator usually means this, except when you use it in
> that situation in which case it means something else. Now, try
> debugging that.
Well in the true OO world the ability to "overload" a message (aka what
is sometimes effectively an operator) allows a wise designer to apply
the traditional meaning of that message (operator) to a new kind of
object. Attempts to change the meaning of a message (operator) when
applied to already well known objects is forbidden by good taste and
sane reviewers.
C++ being a bit of a dog's breakfast seems to have given some people the
idea that they can get away with abusing operator overloading for what
can only amount to obfuscation.
--
Greg A. Woods <gwoods@acm.org>
Kelowna, BC +1 250 762-7675 RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com> Avoncote Farms <woods@avoncote.ca>
[-- Attachment #2: OpenPGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

On Wed, 13 May 2020, Rich Morin wrote:
> A Fortran implementation I used years ago kept constants in a "literal
> pool". So, if you called a subroutine, passing in a constant, there was
> a possibility that the constant might be modified upon the routine's
> return. I don't recall this ever causing a problem in practice, but the
> possibility was amusing...
As I dimly recall, Fortran has always used call by value/result (or
whatever the term is). So, if you modify an argument that happened to be
passed as a constant...
-- Dave

> On Wed, 13 May 2020, Rich Morin wrote:
>
>> A Fortran implementation I used years ago kept constants in a "literal
>> pool". So, if you called a subroutine, passing in a constant, there was
>> a possibility that the constant might be modified upon the routine's
>> return. I don't recall this ever causing a problem in practice, but the
>> possibility was amusing...
>
> As I dimly recall, Fortran has always used call by value/result (or
> whatever the term is). So, if you modify an argument that happened to be
> passed as a constant...
>
Fortran argument passing to functions is call by reference. Some
compilers had a non-standard exception to allow call by value.

> Ah yes. A long time ago, some one came to me with a mysteriously behaving
> Pr1me FORTRAN program; after much head scratching, I found where they were
> changing the value of "0".
>
It was right up there when I traced a bug to find someone had added this
line to one of the headers:
#define notdef 1

> Ah yes. A long time ago, some one came to me with a mysteriously behaving
> Pr1me FORTRAN program; after much head scratching, I found where they were
> changing the value of "0".
>
It was right up there when I traced a bug to find someone had added this
line to one of the headers:
#define notdef 1

> Ah yes. A long time ago, some one came to me with a mysteriously behaving
> Pr1me FORTRAN program; after much head scratching, I found where they were
> changing the value of "0".
>
It was right up there when I traced a bug to find someone had added this
line to one of the headers:
#define notdef 1

On 5/13/20, Paul McJones <paul@mcjones.org> wrote:
>
> C continues the tradition begun by Fortran and Algol 60 of overloading the
> arithmetic operators on the various numeric types. C++ allows new types to
> be defined; when a new type obeys the generally understood properties of a
> built-in type, it makes sense to use the same operator (or function) for the
> corresponding operation on the new type (e.g., addition on complex numbers,
> arbitrary-precision integers and rationals, polynomials, or matrices).
I agree; that makes sense. But I don't like things such as << and >>
in I/O-related classes.
-Paul W.

On 5/13/20, Rich Morin <rdm@cfcl.com> wrote:
>
> A Fortran implementation I used years ago kept constants in a "literal
> pool". So, if you called a subroutine, passing in a constant, there was a
> possibility that the constant might be modified upon the routine's return.
> I don't recall this ever causing a problem in practice, but the possibility
> was amusing...
Any modern compiler worth its salt does literal pooling. Fortunately
modern operating systems have the concept of read-only address space.
These days attempts to modify literal pool constants will give you a
memory access violation at the point where the illegal modification
was made.
-Paul W.

On Wed, May 13, 2020 at 08:42:55PM -0400, John P. Linderman wrote:
> I never liked call by reference. When I was trying to understand a chunk of
> code, it was a great mental simplification to know that whatever a called
> routine did, it couldn't have an effect on the code I was trying to
> understand except through a returned value and (ghastly) global variables.
Call by value is fine for things like a single integer or whatever. When
you have some giant array, you want to pass a pointer.
And "const" helps a lot with indicating the subroutine isn't going to
change it.

On Thu, May 14, 2020 at 09:36:57AM +1000, Dave Horsfall wrote:
> I had to learn C++ for a project at $WORK years ago (the client demanded
> it), and boy was I glad when I left...
Amen. I'm being a whiney grumpy old man, but I'm sort of glad I'm at the
tail end of my career. Going into it now, there are some bright spots,
and some dim ones, Go seems nice, Rust could have been nice but they just
had to come up with a different syntax, I can't see why anyone would do
anything other than an improved C like syntax, Java and C++ seem awful,
D tried but threw too much into the language like C++ did, if D had had
some restraint like Go does, D would probably be my language of choice.
Personally, I just want a modernized C. If you want to see what I want
take a look at https://www.little-lang.org/
It's got some perl goodness, regexps are part of the syntax, switches
work on strings or regexps as well as constants, it's pleasant. And
completely doable as an extension to C.
Oh, and it has reference counting on auto allocated stuff so when it
goes out of scope, free() is automatic.
--lm

> o operator overloading
>
> I never could figure out why Stroustrup implemented that "feature"; let's
> see, this operator usually means this, except when you use it in that
> situation in which case it means something else. Now, try debugging that.
Does your antipathy extend to C++ IO and its heavily overloaded << and >>?
The essence of object-oriented programming is operator overloading. If you
think integer.add(integer) and matrix.add(matrix) are good, perspicuous,
consistent style, then you have to think that integer+integer and
matrix+matrix are even better. To put it more forcefully: the OO style
is revoltingly asymmetric. If you like it why don't you do everyday
arithmetic that way?
I strongly encouraged Bjarne to support operator overloading, used it
to write beautiful code, and do not regret a bit of it. I will agree,
though, that the coercion rules that come along with operator (and
method) overloading are dauntingly complicated. However, for natural uses
(e.g. mixed-mode arithmetic) the rules work intuitively and well.
Mathematics has prospered on operator overloading, and that's why I
wanted it. My only regret is that Bjarne chose to set the vocabulary of
infix operators in stone. Because there's no way to inroduce new ones,
users with poor taste are tempted to recycle the old ones for incongruous
purposes.
C++ offers more features than C and thus more ways to write obscure code.
But when it happens, blame the writer, not the tool.
Doug

[-- Attachment #1: Type: text/plain, Size: 3194 bytes --]
On Thu, May 14, 2020 at 2:42 PM Doug McIlroy <doug@cs.dartmouth.edu> wrote:
> The essence of object-oriented programming is operator overloading.
Mumble -- I'm not so sure ... Kay coined the term, and I've not directly
taken that away from his writings. But maybe I missed it.
I'm a little reluctant to argue here. I feel a little like I did when I
was arguing with my thesis advisor years ago ;_) I so respect you
opinion and you have demonstrated to me that you are correct on so many
things.
>
> Mathematics has prospered on operator overloading, and that's why I
> wanted it.
FWIW: That was Wulf's argument for the BLISS syntax for indirection. It
made more sense mathematically. The problem was that the animals were
already beyond the fields and long lost in the forest, so closing the barn
door later didn't help. Bill later recanted, that while the idea was the
right one, in practice, he was a bad idea.
> ...
> users with poor taste are tempted to recycle the old ones for incongruous
> purposes.
>
Ah, this here is, of course, the crux of the issue. Who shall be
the arbiters of good taste. Doug most of the time, I agree with you and
think you have done a fantastic job of being one of those arbiters. But
like my friend and mentor Wulf, I have to admit the ugly way we did for
years; stands. What you bought, given what we got, seems unbalanced.
There is way too much 'bad' code and I think the overloading multiplies the
bad over the good.
>
> C++ offers more features than C and thus more ways to write obscure code.
>
It's worse than that. The language definition is constantly peed on by the
masses. Whereas Dennis (and Steve) took a very measured approach as to
when and how to add features to C and while it is admittedly quirky, I find
C code a lot more understandable. To me, the features that were added to C
were ones that experience showed made sense [structs/unions/function
prototypes/void/void*]. But as Larry and I have pointed out, not all of
them did (enums). I don't have the same warm feelings about C++.
My complaint with C++ was (is) it just 'too much'. If Bjorne had added
classes and some of the original simpler things that his original "C with
Classes" paper had and stopped, I think I might be willing to use it
today. But like Larry, I avoid it if at all possible. In practice, its a
tarbaby, and little good has come of it in my world.
I applaud Rob, Ken, Brian and Russ with Go - I think thye hit on a better
medium, certainly for userspace code. And thankfully they have not (so
far) been tempted to 'fix it' (although I have heard rumors that Russ has
things up his sleeve). And for me, the jury is still out on Rust (Dan
Cross I admit got to me to rethink its value a bit, but I have not yet used
it for anything). And Python, which I had hoped would be a reasonable
replacement for Perl, also became a mess when people 'improved' it.
> But when it happens, blame the writer, not the tool.
>
Fair point. I've seen some awesome BLISS code in my day. I bet if I
looked I could find some excellent C++. But in my experience, the signal
to noise ratio is not in favor of either.
Respectfully,
Clem
[-- Attachment #2: Type: text/html, Size: 5968 bytes --]

Larry McVoy <lm@mcvoy.com> wrote:
>
> It's got some perl goodness, regexps are part of the syntax, ....
I got into Unix after perl and I've used it a lot. Back in the 1990s I saw
Henry Spencer's joke that perl was the Swiss Army Chainsaw of Unix, as a
riff on lex being its Swiss Army Knife. I came to appreciate lex
regrettably late: lex makes it remarkably easy to chew through a huge pile
of text and feed the pieces to some library code written in C. I've been
using re2c recently (http://re2c.org/), which is differently weird than
lex, though it still uses YY in all its variable names. It's remarkable
how much newer lexer/parser generators can't escape from the user
interface of lex/yacc. Another YY example: http://www.hwaci.com/sw/lemon/
Tony.
--
f.anthony.n.finch <dot@dotat.at> http://dotat.at/
Trafalgar: Cyclonic 6 to gale 8. Rough occasionally very rough in west and
south. Thundery showers. Good, occasionally poor.

[-- Attachment #1: Type: text/plain, Size: 2095 bytes --]
Perhaps for the first time in my career, I am about to disagree with Doug
McIlroy. Sorry, Doug, but I feel the essence of object-oriented computing
is not operator overloading but the representation of behavior. I know you
love using o.o. in OO languages, but that is syntax, not semantics, and OO,
not o.o., is about semantics.
And of course, the purest of the OO languages do represent arithmetic as
methods, but the fit of OO onto C was never going to be smooth.
-rob
On Fri, May 15, 2020 at 4:42 AM Doug McIlroy <doug@cs.dartmouth.edu> wrote:
> > o operator overloading
> >
> > I never could figure out why Stroustrup implemented that "feature"; let's
> > see, this operator usually means this, except when you use it in that
> > situation in which case it means something else. Now, try debugging
> that.
>
> Does your antipathy extend to C++ IO and its heavily overloaded << and >>?
>
> The essence of object-oriented programming is operator overloading. If you
> think integer.add(integer) and matrix.add(matrix) are good, perspicuous,
> consistent style, then you have to think that integer+integer and
> matrix+matrix are even better. To put it more forcefully: the OO style
> is revoltingly asymmetric. If you like it why don't you do everyday
> arithmetic that way?
>
> I strongly encouraged Bjarne to support operator overloading, used it
> to write beautiful code, and do not regret a bit of it. I will agree,
> though, that the coercion rules that come along with operator (and
> method) overloading are dauntingly complicated. However, for natural uses
> (e.g. mixed-mode arithmetic) the rules work intuitively and well.
>
> Mathematics has prospered on operator overloading, and that's why I
> wanted it. My only regret is that Bjarne chose to set the vocabulary of
> infix operators in stone. Because there's no way to inroduce new ones,
> users with poor taste are tempted to recycle the old ones for incongruous
> purposes.
>
> C++ offers more features than C and thus more ways to write obscure code.
> But when it happens, blame the writer, not the tool.
>
> Doug
>
>
[-- Attachment #2: Type: text/html, Size: 2595 bytes --]

[-- Attachment #1: Type: text/plain, Size: 4911 bytes --]
Being Scottish and in the 70s our world was constrained by UK import restrictions - to protect our industries. As a boy I cut my teeth on a language called Algol68 that ran on a ICL 1904 (24 bit word and 6 bit byte, generally a capital letter only system!).
The language was part of my academic course work.
OK it was not a OO language but - in 1968 - it had strict type checking, structures, user-defined types, enums, void, casts, user-defined operators (overloaded) both infix and prefix, (all defined on a formal mathematical basis giving syntax and semantics) Together with “environment enquiries” to find out how big an int was or the precision of a float.
Users could also define their own operators - think about it as no more that strange names of a variable or procedure - and also allocate priority to the various operators in that world (monadics ALWAYS had a priority of 10 and bound tightest). But it went too far. You could define (note that the concept of += did not exist in the base language in 1968) a new operator such as “+:=“
op +:= = (ref in a, int b) ref int: a:=a+b; € It took a pointer to an int, and int and returned the pointer
[Of course you could also define it to be
op +:= = (ref in a, int b) ref int: a:=a-b+7;
]
You could even use Jensen’s device with operators. If you dont know ALgol68 have a speed read of https://research.vu.nl/ws/portalfiles/portal/74119499/11057
My move to unix and C in the 70s was a huge retro step for me - but I could not develop systems code in Algol68 - for example the transput library was about 8K before your blinked. Certainly in C we could code more and faster - no type-checking and we had enuf experience of compilers to understand what was going on at the machine code level - we could just drive the I/O registers directly.
Then C++? Like microsoft windows I evaluated, tried it a bit and voted the theory good but the smell bad. I had a few students who wrote in C++ over a few years, but you know what, it did not do anything earth shattering and it could be a b*gger to work on a debug of a 20K line student program! Like some here I think C++ was just on the wrong side of a line that I dont understand. Similarly, for me, perl is on one side of that line and python is far over the other side.
My question is:
What is that line? I dont understand it? Effort input vs output? Complexity measure, debugging complexity in a 3rd party program? [I hated assembler too unless it was my own (or good) ;-)] But machine code was good, few people would do too much in a complicated way writing in binary/octal/hex!
> On 15 May 2020, at 03:44, Rob Pike <robpike@gmail.com> wrote:
>
> Perhaps for the first time in my career, I am about to disagree with Doug McIlroy. Sorry, Doug, but I feel the essence of object-oriented computing is not operator overloading but the representation of behavior. I know you love using o.o. in OO languages, but that is syntax, not semantics, and OO, not o.o., is about semantics.
>
> And of course, the purest of the OO languages do represent arithmetic as methods, but the fit of OO onto C was never going to be smooth.
>
> -rob
>
>
> On Fri, May 15, 2020 at 4:42 AM Doug McIlroy <doug@cs.dartmouth.edu <mailto:doug@cs.dartmouth.edu>> wrote:
> > o operator overloading
> >
> > I never could figure out why Stroustrup implemented that "feature"; let's
> > see, this operator usually means this, except when you use it in that
> > situation in which case it means something else. Now, try debugging that.
>
> Does your antipathy extend to C++ IO and its heavily overloaded << and >>?
>
> The essence of object-oriented programming is operator overloading. If you
> think integer.add(integer) and matrix.add(matrix) are good, perspicuous,
> consistent style, then you have to think that integer+integer and
> matrix+matrix are even better. To put it more forcefully: the OO style
> is revoltingly asymmetric. If you like it why don't you do everyday
> arithmetic that way?
>
> I strongly encouraged Bjarne to support operator overloading, used it
> to write beautiful code, and do not regret a bit of it. I will agree,
> though, that the coercion rules that come along with operator (and
> method) overloading are dauntingly complicated. However, for natural uses
> (e.g. mixed-mode arithmetic) the rules work intuitively and well.
>
> Mathematics has prospered on operator overloading, and that's why I
> wanted it. My only regret is that Bjarne chose to set the vocabulary of
> infix operators in stone. Because there's no way to inroduce new ones,
> users with poor taste are tempted to recycle the old ones for incongruous
> purposes.
>
> C++ offers more features than C and thus more ways to write obscure code.
> But when it happens, blame the writer, not the tool.
>
> Doug
>
[-- Attachment #2: Type: text/html, Size: 7119 bytes --]

On Fri, May 15, 2020 at 08:55:46AM +0100, Dr Iain Maoileoin wrote:
> My question is:
> What is that line? I dont understand it? Effort input vs output?
> Complexity measure, debugging complexity in a 3rd party program?
I think you are asking precisely the right question. There is a line
where one side is good enough and good enough is just that. The other
side is just too much, too much to get tangled up in.
I think Rob and Ken and the rest of the Go team are not lauded enough
for having the restraint to stay on the right side of the line. It's
so easy to be seduced into yet another feature, "it's not that bad,
we can do it." It is far harder to say "Nope, not gonna do that".
The Go team reminds me a bit of the original QNX team. QNX was/is
a message passing microkernel, it's the only microkernel that is
actually micro (in my experience, I hear that L4 is good but haven't
looked). They had a core team of 3 people that were allowed to touch
the actual microkernel. All of which fit into a 4K instruction cache
on x86. Every commit went through the filter of "does this add to
the cache?"
That sort of discipline is really rare. Far more common is "I benchmarked
it and it only slows down by 1%" - that's death by a thousand paper cuts.

[-- Attachment #1: Type: text/plain, Size: 1147 bytes --]
On Fri, May 15, 2020 at 08:55:46AM +0100, Dr Iain Maoileoin wrote:
> My question is:
> What is that line? I dont understand it? Effort input vs output?
> Complexity measure, debugging complexity in a 3rd party program?
I think of it less as a line than as a continuum. I am reminded of an input
routine I wrote for a sort several decades ago. Allocate a pointer from one
end of memory, start reading a record into the other end of memory with
something like
while ((c = getchar()) != EOF) if (c == '\n') { /* entire record is now
there */
Very easy to understand, no guesswork about allocating pointers versus
records, and a complete and utter pig. 50% of the processing time of the
entire sort went into loading records. There were 5 or 6 comparisons
being done for every *character* of input (I omitted the bit about
verifying that there was room to store the next character). It might have
made for good reading, but nobody would have used it, because much faster
sorts were already available, and most people *use* code, not *read* it. So
I had to push into uglier territory to get something that worked well
enough to be worth reading.
[-- Attachment #2: Type: text/html, Size: 2067 bytes --]

[-- Attachment #1: Type: text/plain, Size: 649 bytes --]
EOF is defined to be -1.
getchar() returns int, but c is a unsigned char, the value of (c = getchar()) will be 255. This will never compare equal to -1.
Ron,
Hmmm... getchar/getc are defined as returning int in the man page and C is traditionally defined as an int in this code..
On Fri, May 15, 2020 at 4:02 PM <ron@ronnatalie.com <mailto:ron@ronnatalie.com> > wrote:
Unfortunately, if c is char on a machine with unsigned chars, or it’s of type unsigned char, the EOF will never be detected.
* while ((c = getchar()) != EOF) if (c == '\n') { /* entire record is now there */
[-- Attachment #2: Type: text/html, Size: 6206 bytes --]

[-- Attachment #1: Type: text/plain, Size: 1200 bytes --]
I suspect we are saying the same thing. C is defined as an int (as Larry
also showed), not an unsigned char (and frankly if you had done that, most
modern compilers will give you a warning). IIRC you are correct that the
Ritchie compiler would not catch that error.
But, the truth is I know few C (experienced) programmers that would define
c as anything but an int; particularly in the modern era with compiler
warnings as good as they are.
Clem
On Fri, May 15, 2020 at 4:18 PM <ron@ronnatalie.com> wrote:
> EOF is defined to be -1.
>
> getchar() returns int, but c is a unsigned char, the value of (c =
> getchar()) will be 255. This will never compare equal to -1.
>
>
>
>
>
>
>
> Ron,
>
>
>
> Hmmm... getchar/getc are defined as returning int in the man page and C is
> traditionally defined as an int in this code..
>
>
>
> On Fri, May 15, 2020 at 4:02 PM <ron@ronnatalie.com> wrote:
>
> Unfortunately, if c is char on a machine with unsigned chars, or it’s of
> type unsigned char, the EOF will never be detected.
>
>
>
>
>
>
>
>
> - while ((c = getchar()) != EOF) if (c == '\n') { /* entire record is
> now there */
>
>
>
>
[-- Attachment #2: Type: text/html, Size: 3547 bytes --]

> I feel the essence of object-oriented computing
> is not operator overloading but the representation of behavior.
Rob is right. Overloading is a universal characteristic
of OO programming, but not the essence.
Doug

[-- Attachment #1: Type: text/plain, Size: 529 bytes --]
On Fri, May 15, 2020, 2:35 PM Doug McIlroy <doug@cs.dartmouth.edu> wrote:
> > I feel the essence of object-oriented computing
> > is not operator overloading but the representation of behavior.
>
> Rob is right. Overloading is a universal characteristic
> of OO programming, but not the essence.
>
I've viewed the essence as everything is an object and I can send messages
to objects. From there, many different styles flow as different efforts
leveraged different means and methods to purport to achieve that goal.
Warner
>
[-- Attachment #2: Type: text/html, Size: 1090 bytes --]

[-- Attachment #1: Type: text/plain, Size: 301 bytes --]
On Fri, 15 May 2020, ron@ronnatalie.com wrote:
> Unfortunately, if c is char on a machine with unsigned chars, or it’s of
> type unsigned char, the EOF will never be detected.
Isn't it nonstandard (although I am aware of some compilers that do it) to
default the type of char to unsigned?
-uso.

> Isn't it nonstandard (although I am aware of some compilers that do it) to
> default the type of char to unsigned?
No.
"The implementation shall define char to have the same range,
representation, and behavior as either signed char or unsigned char."
- C99
(Technically it's a separate type from both of them.)
-- Richard
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

On Fri, 15 May 2020, Richard Tobin wrote:
>> Isn't it nonstandard (although I am aware of some compilers that do it) to
>> default the type of char to unsigned?
>
> No.
>
> "The implementation shall define char to have the same range,
> representation, and behavior as either signed char or unsigned char."
> - C99
>
> (Technically it's a separate type from both of them.)
>
> -- Richard
>
>
Huh. I thought all integers were supposed to be signed by default
regardless of their size. o.o
That said, I do use "int c; ... c=fgetc(stdin);" or the like in my code.
-uso.

Char is different. One of the silly foibles of C. char can be signed or
unsigned at the implementation's decision.
-----Original Message-----
From: Steve Nickolas <usotsuki@buric.co>
Sent: Friday, May 15, 2020 5:53 PM
To: Richard Tobin <richard@inf.ed.ac.uk>
Cc: Steve Nickolas <usotsuki@buric.co>; ron@ronnatalie.com; tuhs@tuhs.org
Subject: Re: [TUHS] v7 K&R C
Huh. I thought all integers were supposed to be signed by default
regardless of their size. o.o
That said, I do use "int c; ... c=fgetc(stdin);" or the like in my code.
-uso.

ron@ronnatalie.com wrote in
<077a01d62b08$e696bee0$b3c43ca0$@ronnatalie.com>:
|Char is different. One of the silly foibles of C. char can be signed or
|unsigned at the implementation's decision.
And i would wish Thompson and Pike would have felt the need to
design UTF-8 ten years earlier. Maybe we would have a halfway
usable "wide" character interface in the standard (C) library.
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)

Discussions today on the TUHS list about the signed/unsigned nature of
the C char type led me to reexamine logs of my feature test package at
http://www.math.utah.edu/pub/features/
I had 170 build logs for it from 2017.11.07, so I moved those aside
and ran another set of builds in our current enlarged test farm. That
generated another 361 fresh builds. Those tests are all with the C
compiler named "cc". I did not explore what other C compilers did,
but I strongly suspect that they all agree on any single platform.
On all but THREE systems, the tests report that "char" is signed, with
CHAR_MAX == +127.
The three outliers have char unsigned with CHAR_MAX == +255, and are
* ARM armv7l Linux 4.13.1 (2017) and 5.6.7 (2020)
* SGI O2 R10000-SC (150 MHz) IRIX 6.5 (2017 and 2020)
* IBM POWER8 CentOS Linux release 7.4.1708 (AltArch) (2017)
So, while the ISO C Standards, and historical practice, leave it
implementation dependent whether char is signed or unsigned, there is
a strong majority for a signed type.
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu -
- 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------

Nelson H. F. Beebe wrote in
<CMM.0.95.0.1589588129.beebe@gamma.math.utah.edu>:
|Discussions today on the TUHS list about the signed/unsigned nature of
|the C char type led me to reexamine logs of my feature test package at
|
| http://www.math.utah.edu/pub/features/
|
|I had 170 build logs for it from 2017.11.07, so I moved those aside
|and ran another set of builds in our current enlarged test farm. That
|generated another 361 fresh builds. Those tests are all with the C
|compiler named "cc". I did not explore what other C compilers did,
|but I strongly suspect that they all agree on any single platform.
|
|On all but THREE systems, the tests report that "char" is signed, with
|CHAR_MAX == +127.
|
|The three outliers have char unsigned with CHAR_MAX == +255, and are
|
| * ARM armv7l Linux 4.13.1 (2017) and 5.6.7 (2020)
| * SGI O2 R10000-SC (150 MHz) IRIX 6.5 (2017 and 2020)
| * IBM POWER8 CentOS Linux release 7.4.1708 (AltArch) (2017)
|
|So, while the ISO C Standards, and historical practice, leave it
|implementation dependent whether char is signed or unsigned, there is
|a strong majority for a signed type.
Just to note Linus Torvalds "famous" "It better had been unsigned,
Virginia".
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)

[-- Attachment #1: Type: text/plain, Size: 480 bytes --]
If I had been thick enough to declare c as an unsigned char, it would have
taken a good bit more than 50% of the time, and wouldn't have worked at all.
On Fri, May 15, 2020 at 4:02 PM <ron@ronnatalie.com> wrote:
> Unfortunately, if c is char on a machine with unsigned chars, or it’s of
> type unsigned char, the EOF will never be detected.
>
>
>
>
>
>
>
>
> - while ((c = getchar()) != EOF) if (c == '\n') { /* entire record is
> now there */
>
>
>
[-- Attachment #2: Type: text/html, Size: 1611 bytes --]

I always kept local, single characters in ints. This avoided the problem with loading a character being signed or unsigned. The reason for not specifying is obvious. Today, you can pick the move-byte-into-word instruction that either sign extends or doesn't. But when C was defined that wasn't the case. Some machines sign extended when a byte was loaded into a register and some filled the upper bits with zero. For machines that filled with zero, a char was unsigned. If you forced the language to do one or the other, it would be expensive on the opposite kind of machine.
It's one of the things that made C a good choice on a wide variety of machines.
I guess I always "saw" the return value of the getchar() as being in a int sized register, at first namely R0, so kept the character values returned as ints. The actual EOF indication from a read is a return value of zero for the number of characters read.
But I'm just making noise because I'm sure everyone knows all this.
Brantley
> On May 15, 2020, at 4:18 PM, ron@ronnatalie.com wrote:
>
> EOF is defined to be -1.
> getchar() returns int, but c is a unsigned char, the value of (c = getchar()) will be 255. This will never compare equal to -1.
>
>
>
> Ron,
>
> Hmmm... getchar/getc are defined as returning int in the man page and C is traditionally defined as an int in this code..
>
> On Fri, May 15, 2020 at 4:02 PM <ron@ronnatalie.com> wrote:
>> Unfortunately, if c is char on a machine with unsigned chars, or it’s of type unsigned char, the EOF will never be detected.
>>
>>
>>
>>> • while ((c = getchar()) != EOF) if (c == '\n') { /* entire record is now there */

On Sat, May 16, 2020 at 01:34:27AM +0200, Steffen Nurpmeso wrote:
> ron@ronnatalie.com wrote in
> <077a01d62b08$e696bee0$b3c43ca0$@ronnatalie.com>:
> |Char is different. One of the silly foibles of C. char can be signed or
> |unsigned at the implementation's decision.
>
> And i would wish Thompson and Pike would have felt the need to
> design UTF-8 ten years earlier. Maybe we would have a halfway
> usable "wide" character interface in the standard (C) library.
Yeah, I agree. UTF-8 is clever, really clever. It makes the other
stuff look ham handed.

[-- Attachment #1: Type: text/plain, Size: 1408 bytes --]
On Fri, May 15, 2020 at 6:23 PM Nelson H. F. Beebe <beebe@math.utah.edu>
wrote:
> Discussions today on the TUHS list about the signed/unsigned nature of
> the C char type led me to reexamine logs of my feature test package at
>
> http://www.math.utah.edu/pub/features/
>
> I had 170 build logs for it from 2017.11.07, so I moved those aside
> and ran another set of builds in our current enlarged test farm. That
> generated another 361 fresh builds. Those tests are all with the C
> compiler named "cc". I did not explore what other C compilers did,
> but I strongly suspect that they all agree on any single platform.
>
> On all but THREE systems, the tests report that "char" is signed, with
> CHAR_MAX == +127.
>
> The three outliers have char unsigned with CHAR_MAX == +255, and are
>
> * ARM armv7l Linux 4.13.1 (2017) and 5.6.7 (2020)
> * SGI O2 R10000-SC (150 MHz) IRIX 6.5 (2017 and 2020)
> * IBM POWER8 CentOS Linux release 7.4.1708 (AltArch) (2017)
>
> So, while the ISO C Standards, and historical practice, leave it
> implementation dependent whether char is signed or unsigned, there is
> a strong majority for a signed type.
>
arm has been the biggest outlier in terms of unsigned char. In FreeBSD,
this has been the second largest source of bugs with the platform... the
OABI weird alignment requirements being the first (thankfully behind us)...
Warner
[-- Attachment #2: Type: text/html, Size: 1958 bytes --]

On Sat, 16 May 2020, Peter Jeremy wrote:
> On 2020-May-15 16:56:42 -0400, Steve Nickolas <usotsuki@buric.co> wrote:
>> Isn't it nonstandard (although I am aware of some compilers that do it) to
>> default the type of char to unsigned?
>
> The standard allows "char" to be either signed or unsigned. The ARM ABI
> defines char as unsigned.
>
> I recall that Lattice C on the M68K allowed either signed or unsigned char
> via a flag. Setting it to "unsigned" generally produced faster code on
> my Amiga, though some code assumed signed chars and broke.
Borland did the same.
CC65, I think, defaults to unsigned char, but it's missing some other
features. It is, however, the closest (to my knowledge) that C on the
6502 gets to ANSI starndard.
-uso.

[-- Attachment #1: Type: text/plain, Size: 3297 bytes --]
On Fri, May 15, 2020 at 8:58 PM Brantley Coile <brantley@coraid.com> wrote:
> I always kept local, single characters in ints. This avoided the problem
> with loading a character being signed or unsigned. The reason for not
> specifying is obvious. Today, you can pick the move-byte-into-word
> instruction that either sign extends or doesn't. But when C was defined
> that wasn't the case. Some machines sign extended when a byte was loaded
> into a register and some filled the upper bits with zero. For machines that
> filled with zero, a char was unsigned. If you forced the language to do one
> or the other, it would be expensive on the opposite kind of machine.
>
Not only that, but if one used an exactly `char`-width value to hold, er,
character data as returned from `getchar` et al, then one would necessarily
give up the possibility of handling whatever character value was chosen for
the sentinel marking end-of-input stream. `getchar` et al are defined to
return EOF on end of input; if they didn't return a wider type than `char`,
there would be data that could not be read. On probably every machine I am
ever likely to use again in my lifetime, byte value 255 would be -1 as a
signed char, but it is also a perfect valid value for a byte.
The details of whether char is signed or unsigned aside, use of a wider
type is necessary for correctness and ability to completely represent the
input data.
It's one of the things that made C a good choice on a wide variety of
> machines.
>
> I guess I always "saw" the return value of the getchar() as being in a int
> sized register, at first namely R0, so kept the character values returned
> as ints. The actual EOF indication from a read is a return value of zero
> for the number of characters read.
>
That's certainly true. Had C supported multiple return values or some kind
of option type from the outset, it might have been that `getchar`, read,
etc, returned a pair with some useful value (e.g., for `getchar` the value
of the byte read; for `read` a length) and some indication of an
error/EOF/OK value etc. Notably, both Go and Rust support essentially this:
in Go, `io.Read()` returns a `(int, error)` pair, and the error is `io.EOF`
on end-of-input; in Rust, the `read` method of the `Read` trait returns a
`Result<usize, io::Error>`, though a `Result::Ok(n)`, where `n==0`
indicates EOF.
But I'm just making noise because I'm sure everyone knows all this.
>
I think it's worthwhile stating these things explicitly, sometimes.
- Dan C.
> On May 15, 2020, at 4:18 PM, ron@ronnatalie.com wrote:
> >
> > EOF is defined to be -1.
> > getchar() returns int, but c is a unsigned char, the value of (c =
> getchar()) will be 255. This will never compare equal to -1.
> >
> >
> >
> > Ron,
> >
> > Hmmm... getchar/getc are defined as returning int in the man page and C
> is traditionally defined as an int in this code..
> >
> > On Fri, May 15, 2020 at 4:02 PM <ron@ronnatalie.com> wrote:
> >> Unfortunately, if c is char on a machine with unsigned chars, or it’s
> of type unsigned char, the EOF will never be detected.
> >>
> >>
> >>
> >>> • while ((c = getchar()) != EOF) if (c == '\n') { /* entire record
> is now there */
>
>
[-- Attachment #2: Type: text/html, Size: 4362 bytes --]

> On Fri, May 15, 2020 at 4:02 PM <ron@ronnatalie.com> wrote:
>
>Unfortunately, if c is char on a machine with unsigned chars, or it’s of
>type unsigned char, the EOF will never be detected.
>
> - while ((c = getchar()) != EOF) if (c == '\n') { /* entire record is now there */
The function prototype for getchar() is: int getchar(void);
It returns an int, not a char. In all likelihood this is specifically
*because* EOF is defined as -1. The above code works fine if c is an
int. One always has to be very careful when doing a typecast of a
function return value.
-Paul W.

On 5/15/20, Warner Losh <imp@bsdimp.com> wrote:
>
> arm has been the biggest outlier in terms of unsigned char. In FreeBSD,
> this has been the second largest source of bugs with the platform... the
> OABI weird alignment requirements being the first (thankfully behind us)...
Why did the implementers of the Unix ABI for ARM decide to have char
be unsigned? Was there an architectural reason for it?
-Paul W.

[-- Attachment #1: Type: text/plain, Size: 1021 bytes --]
On Sat, May 16, 2020 at 10:28 AM Paul Winalski <paul.winalski@gmail.com>
wrote:
> > On Fri, May 15, 2020 at 4:02 PM <ron@ronnatalie.com> wrote:
> >
> >Unfortunately, if c is char on a machine with unsigned chars, or it’s of
> >type unsigned char, the EOF will never be detected.
> >
> > - while ((c = getchar()) != EOF) if (c == '\n') { /* entire record is
> now there */
>
> The function prototype for getchar() is: int getchar(void);
>
> It returns an int, not a char. In all likelihood this is specifically
> *because* EOF is defined as -1. The above code works fine if c is an
> int. One always has to be very careful when doing a typecast of a
> function return value.
>
In the early days of my involvement with FreeBSD, I went through and fixed
about a dozen cases where getopt was being assigned to a char and then
compared with EOF. I'm certain that this is why. Also EOF has to be a value
that's not representable by a character, or your 0xff bytes would disappear.
Warner
[-- Attachment #2: Type: text/html, Size: 1494 bytes --]

> The function prototype for getchar() is: int getchar(void);
>
> It returns an int, not a char. In all likelihood this is specifically
> *because* EOF is defined as -1.
It would have probably returned int anyway, because of the automatic
promotion of char to int in expressions. It was natural to declare
functions returning char as int, if you bothered to declare them at
all. As K&R1 said:
Since char promotes to int in expressions, there is no need
to declare functions that return char.
Similarly functions that might return short or float would normally
return int or double; there aren't separate atof and atod functions
for example.
-- Richard
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Paul Winalski <paul.winalski@gmail.com> writes:
> On 5/15/20, Warner Losh <imp@bsdimp.com> wrote:
>>
>> arm has been the biggest outlier in terms of unsigned char. In FreeBSD,
>> this has been the second largest source of bugs with the platform... the
>> OABI weird alignment requirements being the first (thankfully behind us)...
>
> Why did the implementers of the Unix ABI for ARM decide to have char
> be unsigned? Was there an architectural reason for it?
>
> -Paul W.
My understanding is that it is a lot more efficient to use unsigned char
on arm. You can make gcc, for example, deal with this, but it costs. I
remember having to tell gcc to deal with it when I ported the Doom
engine to a StrongARM processor device under NetBSD many years ago. I
mostly remember the code running well enough, but it was larger.
--
Brad Spencer - brad@anduin.eldar.org - KC8VKS - http://anduin.eldar.org

[-- Attachment #1: Type: text/plain, Size: 1156 bytes --]
On Sat, May 16, 2020 at 2:35 PM Brad Spencer <brad@anduin.eldar.org> wrote:
> Paul Winalski <paul.winalski@gmail.com> writes:
>
> > On 5/15/20, Warner Losh <imp@bsdimp.com> wrote:
> >>
> >> arm has been the biggest outlier in terms of unsigned char. In FreeBSD,
> >> this has been the second largest source of bugs with the platform... the
> >> OABI weird alignment requirements being the first (thankfully behind
> us)...
> >
> > Why did the implementers of the Unix ABI for ARM decide to have char
> > be unsigned? Was there an architectural reason for it?
> >
> > -Paul W.
>
>
> My understanding is that it is a lot more efficient to use unsigned char
> on arm. You can make gcc, for example, deal with this, but it costs. I
> remember having to tell gcc to deal with it when I ported the Doom
> engine to a StrongARM processor device under NetBSD many years ago. I
> mostly remember the code running well enough, but it was larger.
>
I've seen numbers that suggest it's about 10% smaller to use unsigned
characters, and the code runs 5-10% faster. I've not looked at the
generated code to understand why, exactly, that might be the case.
Warner
[-- Attachment #2: Type: text/html, Size: 1749 bytes --]

It would have to be something bigger than char because you need EOF (whatever it could be defined as) to be distinct from any character.
> On May 16, 2020, at 2:45 PM, Richard Tobin <richard@inf.ed.ac.uk> wrote:
>
>> The function prototype for getchar() is: int getchar(void);
>>
>> It returns an int, not a char. In all likelihood this is specifically
>> *because* EOF is defined as -1.
>
> It would have probably returned int anyway, because of the automatic
> promotion of char to int in expressions. It was natural to declare
> functions returning char as int, if you bothered to declare them at
> all. As K&R1 said:
>
> Since char promotes to int in expressions, there is no need
> to declare functions that return char.
>
> Similarly functions that might return short or float would normally
> return int or double; there aren't separate atof and atod functions
> for example.
>
> -- Richard
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>

The issue is making char play double duty as a basic storage unit and a native character.
This means you can never have 16 (or 32 bit) chars on any machine that you wanted to support 8 bit integers.
> On May 15, 2020, at 7:34 PM, Steffen Nurpmeso <steffen@sdaoden.eu> wrote:
>
> ron@ronnatalie.com wrote in
> <077a01d62b08$e696bee0$b3c43ca0$@ronnatalie.com>:
> |Char is different. One of the silly foibles of C. char can be signed or
> |unsigned at the implementation's decision.
>
> And i would wish Thompson and Pike would have felt the need to
> design UTF-8 ten years earlier. Maybe we would have a halfway
> usable "wide" character interface in the standard (C) library.
>
> --steffen
> |
> |Der Kragenbaer, The moon bear,
> |der holt sich munter he cheerfully and one by one
> |einen nach dem anderen runter wa.ks himself off
> |(By Robert Gernhardt)

Ronald Natalie wrote in
<5DB09C5A-F5DA-4375-AAA5-0711FC6FB1D9@ronnatalie.com>:
|> On May 15, 2020, at 7:34 PM, Steffen Nurpmeso <steffen@sdaoden.eu> wrote:
|> ron@ronnatalie.com wrote in
|> <077a01d62b08$e696bee0$b3c43ca0$@ronnatalie.com>:
|>|Char is different. One of the silly foibles of C. char can be \
|>|signed or
|>|unsigned at the implementation's decision.
|>
|> And i would wish Thompson and Pike would have felt the need to
|> design UTF-8 ten years earlier. Maybe we would have a halfway
|> usable "wide" character interface in the standard (C) library.
|The issue is making char play double duty as a basic storage unit and \
|a native character.
|This means you can never have 16 (or 32 bit) chars on any machine that \
|you wanted to support 8 bit integers.
Oh, I am not the person to step in here.
[I deleted 60+ lines of char*/void*, and typedefs,
etc. experiences i had. And POSIX specifying that a byte has
8-bit. And soon that NULL/(void*)0 has all bits 0.]
Unicode / ISO 10646 did not exist by then. sure.
I am undecided. I was a real fan of UTF-32 (32-bit character)
at times, but when i looked more deeply in Unicode, it turned
out to be false thinking: some languages are so complex that you
need to address entire sentences, or at least encapsulate
"graphem" boundaries, going for "codepoints" is just wrong.
Then i thought Microsoft and their UTF-16 decision was not that
bad, because almost all real life characters of Unicode can
nonetheless be addressed by a single 16-bit codepoint, and that
eases programming. But moreover UTF-8 needs three bytes for
most of them.
Why did it happen? Why was the char type overloaded like this?
Why was there no byte or "mem" type? It is to this day, i think,
that ISO C allows to bypass their (terrible) aliasing rules by
casting to and from char*.
In v5 usr/src/s2/mail.c i see
+getfield(buf)
+char buf[];
+{
+ int j;
+ char c;
+
+ j = 0;
+ while((c = buf[j] = getc(iobuf)) >= 0)
+ if(c==':' || c=='\n') {
+ buf[j] =0;
+ return(1);
+ } else
+ j++;
+ return(0);
+}
so here the EOF was different and char was signed 7-bit it seems.
At that time at latest i have to admit that i have not looked in
old source code for years. But just had a quick look in the dmr/
of 5th revision, and there you see "char lowbyte", for example.
A nice Sunday from Germany! i wish you, and the list,
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)

Tony Finch wrote in
<alpine.DEB.2.20.2005142316170.3374@grey.csi.cam.ac.uk>:
|Larry McVoy <lm@mcvoy.com> wrote:
|>
|> It's got some perl goodness, regexps are part of the syntax, ....
|
|I got into Unix after perl and I've used it a lot. Back in the 1990s I saw
|Henry Spencer's joke that perl was the Swiss Army Chainsaw of Unix, as a
|riff on lex being its Swiss Army Knife. I came to appreciate lex
|regrettably late: lex makes it remarkably easy to chew through a huge pile
|of text and feed the pieces to some library code written in C. I've been
|using re2c recently (http://re2c.org/), which is differently weird than
|lex, though it still uses YY in all its variable names. It's remarkable
|how much newer lexer/parser generators can't escape from the user
|interface of lex/yacc. Another YY example: http://www.hwaci.com/sw/lemon/
P.S.: i really hate automated lexers. I never ever got used to
use them. For learning i once tried to use flex/bison, but
i failed really hard. I like that blood, sweat and tears thing,
and using a lexer seems so shattered, all the pieces. And i find
them really hard to read.
If you can deal with them they are surely a relief, especially in
rapidly moving syntax situations. But if i look at settled source
code which uses it, for example usr.sbin/ospfd/parse.y, or
usr.sbin/smtpd/parse.y, both of OpenBSD, then i feel lost and am
happy that i do not need to maintain that code.
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)

On Sun, May 17, 2020 at 01:53:08AM +0200, Steffen Nurpmeso wrote:
> Tony Finch wrote in
> <alpine.DEB.2.20.2005142316170.3374@grey.csi.cam.ac.uk>:
> |Larry McVoy <lm@mcvoy.com> wrote:
> |>
> |> It's got some perl goodness, regexps are part of the syntax, ....
> |
> |I got into Unix after perl and I've used it a lot. Back in the 1990s I saw
> |Henry Spencer's joke that perl was the Swiss Army Chainsaw of Unix, as a
> |riff on lex being its Swiss Army Knife. I came to appreciate lex
> |regrettably late: lex makes it remarkably easy to chew through a huge pile
> |of text and feed the pieces to some library code written in C. I've been
> |using re2c recently (http://re2c.org/), which is differently weird than
> |lex, though it still uses YY in all its variable names. It's remarkable
> |how much newer lexer/parser generators can't escape from the user
> |interface of lex/yacc. Another YY example: http://www.hwaci.com/sw/lemon/
>
> P.S.: i really hate automated lexers. I never ever got used to
> use them. For learning i once tried to use flex/bison, but
> i failed really hard. I like that blood, sweat and tears thing,
> and using a lexer seems so shattered, all the pieces. And i find
> them really hard to read.
They are not bad if you are good at it. One of my guys has a PhD in
compilers and he's good at it.
They are not good at performance. BitKeeper has an extensive printf
like (sort of, different syntax) language that can be used to customize
log output. Rob originally did all that in flex/bison but the performance
started to hurt so he rewrote it all:
/*
* This is a recursive-descent parser that implements the following
* grammar for dspecs (where [[...]] indicates an optional clause
* and {{...}} indicates 0 or more repetitions of):
*
* <stmt_list> -> {{ <stmt> }}
* <stmt> -> $if(<expr>){<stmt_list>}[[$else{<stmt_list>}]]
* -> $unless(<expr>){<stmt_list>}[[$else{<stmt_list>}]]
* -> $each(:ID:){<stmt_list>}
* -> ${<num>=<stmt_list>}
* -> <atom>
* <expr> -> <expr2> {{ <logop> <expr2> }}
* <expr2> -> <str> <relop> <str>
* -> <str>
* -> (<expr>)
* -> !<expr2>
* <str> -> {{ <atom> }}
* <atom> -> char
* -> escaped_char
* -> :ID:
* -> (:ID:)
* -> $<num>
* <logop> -> " && " | " || "
* <relop> -> "=" | "!=" | "=~"
* -> " -eq " | " -ne " | " -gt " | " -ge " | " -lt " | " -le "
*
* This grammar is ambiguous due to (:ID:) loooking like a
* parenthesized sub-expression. The code tries to parse (:ID:) first
* as an $each variable, then as a regular :ID:, then as regular text.
*
* Note that this is broken: $if((:MERGE:)){:REV:}
*
* The following procedures can be thought of as implementing an
* attribute grammar where the output parameters are synthesized
* attributes which hold the expression values and the next token
* of lookahead in some cases. It has been written for speed.
*
* NOTE: out==0 means evaluate but throw away.
*
* Written by Rob Netzer <rob@bolabs.com> with some hacking
* by wscott & lm.
*/
That stuff screams perf wise.

On Fri, May 15, 2020 at 10:31:38PM +0100, Richard Tobin wrote:
> "The implementation shall define char to have the same range,
> representation, and behavior as either signed char or unsigned char."
> - C99
>
> (Technically it's a separate type from both of them.)
I was about to suggest I'd yet to come across a compiler which
handled them that way, but on checking I find that both clang
and gcc do now in effect have 3 types.
i.e. both 'unsigned char *' and 'signed char *' values passed to
a function taking 'char *' raises a warning.
I wonder when they started doing that?
DF

It technically probably always should have. Void* (which has the same
format as char*) would have accepted either type pointer, char* shouldn't,
though I suspect that early compilers that predate void* would have happily
converted any pointer to char* (or int for that matter).
-----Original Message-----
From: TUHS <tuhs-bounces@minnie.tuhs.org> On Behalf Of Derek Fawcus
Sent: Sunday, May 17, 2020 12:11 PM
To: tuhs@tuhs.org
Subject: Re: [TUHS] v7 K&R C
On Fri, May 15, 2020 at 10:31:38PM +0100, Richard Tobin wrote:
> "The implementation shall define char to have the same range,
> representation, and behavior as either signed char or unsigned char."
> - C99
>
> (Technically it's a separate type from both of them.)
I was about to suggest I'd yet to come across a compiler which handled them
that way, but on checking I find that both clang and gcc do now in effect
have 3 types.
i.e. both 'unsigned char *' and 'signed char *' values passed to a function
taking 'char *' raises a warning.
I wonder when they started doing that?
DF

On 5/16/20, Steffen Nurpmeso <steffen@sdaoden.eu> wrote:
>
> Why was there no byte or "mem" type?
These days machine architecture has settled on the 8-bit byte as the
unit for addressing, but it wasn't always the case. The PDP-10
addressed memory in 36-bit units. The character manipulating
instructions could deal with a variety of different byte lengths: you
could store six 6-bit BCD characters per machine word, or five ASCII
7-bit characters (with a bit left over), or four 8-bit characters
(ASCII plus parity, with four bits left over), or four 9-bit
characters.
Regarding a "mem" type, take a look at BLISS. The only data type that
language has is the machine word.
> +getfield(buf)
> +char buf[];
> +{
> + int j;
> + char c;
> +
> + j = 0;
> + while((c = buf[j] = getc(iobuf)) >= 0)
> + if(c==':' || c=='\n') {
> + buf[j] =0;
> + return(1);
> + } else
> + j++;
> + return(0);
> +}
>
> so here the EOF was different and char was signed 7-bit it seems.
That makes perfect sense if you're dealing with ASCII, which is a
7-bit character set.
-Paul W.

>
> so here the EOF was different and char was signed 7-bit it seems.
That makes perfect sense if you're dealing with ASCII, which is a 7-bit character set.
But that assumes you were reading "characters" rather than "bytes." Binary data certainly could be any combination of 8 bits and you'd want something out of band to signal errors/eof.

On Thu, May 14, 2020 at 10:21:07AM -0700, Larry McVoy wrote:
> On Wed, May 13, 2020 at 08:42:55PM -0400, John P. Linderman wrote:
> > I never liked call by reference. When I was trying to understand a chunk of
> > code, it was a great mental simplification to know that whatever a called
> > routine did, it couldn't have an effect on the code I was trying to
> > understand except through a returned value and (ghastly) global variables.
That has always been my issue with the C++ references, that one could not
read a piece of code in isolation, and know when a reference may be made.
I guess I'd be happy with references if the syntax always required one to
write '&x' when they're being created, then the called function can choose
if it either wishes to use a pointer or a reference, the only difference
being the syntax used to deref the reference.
As to Doug's point about new arithmetic types and overloading, I recall
a few years ago reading on the 9fans list about an extension there (in KenC?)
which supported them in C. I've not managed to dig up the details again,
maybe someone else could. As I recall it involved defining structs.
> Call by value is fine for things like a single integer or whatever. When
> you have some giant array, you want to pass a pointer.
>
> And "const" helps a lot with indicating the subroutine isn't going to
> change it.
However that is simply the ABI, i.e. it should be possible for a sufficiently
clever compiler to implement such a call-by-value as a call-by-constish-reference.
i.e. this:
somefn(struct s p) {...}
struct s ss; somefn(ss);
in effect becomes syntax sugar for:
somefn(constish ref struct s p) {...}
struct s ss; somefn(&ss);
Where 'constish' does not allow 'ss' to be altered even if somefn() assigns
to 'p', because in that case it would do sufficient copying so as to
make things work. There was a proposed MIPS ABI which stated this.
The current C semantics in effect do that, but with the ABIs always having
the caller make a copy and pass a reference to it, rather than allowing the
callee to make a copy if/when required.
DF

On 5/17/20, ron@ronnatalie.com <ron@ronnatalie.com> wrote:
>>
>> so here the EOF was different and char was signed 7-bit it seems.
>
> That makes perfect sense if you're dealing with ASCII, which is a 7-bit
> character set.
>
> But that assumes you were reading "characters" rather than "bytes." Binary
> data certainly could be any combination of 8 bits and you'd want something
> out of band to signal errors/eof.
Well, the function in question is called getchar(). And although
these days "byte" is synonymous with "8 bits", historically it meant
"the number of bits needed to store a single character".
-Paul W.

[-- Attachment #1: Type: text/plain, Size: 3710 bytes --]
On Sun, May 17, 2020 at 12:38 PM Paul Winalski <paul.winalski@gmail.com>
wrote:
> Well, the function in question is called getchar(). And although
> these days "byte" is synonymous with "8 bits", historically it meant
> "the number of bits needed to store a single character".
>
Yep, I think that is the real crux of the issue. If you grew up with
systems that used a 5, 6, or even a 7-bit byte; you have an appreciation of
the difference. Remember, B, like BCPL, and BLISS only have a 'word' as
the storage unit. But by the late 1960s, a byte had been declared (thanks
to Fred Brooks shutting down Gene Amhadl's desires) at 8 bits, at least at
IBM.** Of course, the issue was that ASCII was using only 7 bits to store
a character.
DEC was still sort of transitioning from word-oriented hardware (a lesson,
Paul, you and I lived through being forgotten a few years later with
Alpha); but the PDP-11, unlike the 18/36 or 12 bit systems followed IBM's
lead and used the 8-bit byte and byte addressing. But that nasty 7-bit
ASCII thing messed it up a little bit. When C was created (for the 8-bit
byte addressed PDP-11) unlike B, Dennis introduced different types. As he
says "C is quirky" and one of those quirks is that he created a "char"
type, which was thus 8 bits naturally for the PDP-11, but was storing data
following that 7-bit ASCII data with a bit leftover.
As previously said in this discussion, to me issue is that it was called a
*char,* not a *byte*. But I wonder if Dennis and team had had that
foresight, it would have in practice made that much difference? It took
many years and many lines of code and trying to encode the glyphs for many
different natural languages to get to ideas like UTF.
As someone else pointed out, one of the other quirks of C was trying to
encode the return value of a function into single 'word.' But like many
things in the world, we have to build it first and let it succeed before we
can find real flaws. C was incredibly successful and as I said before,
I'll not trade it for any other language yet it what it had allowed me and
my peers to do over the years. I humbled by what Dennis did, I doubt many
of us would have done as well. That doesn't make C perfect, or than we can
not strive to do better, and maybe time will show Rust or Go to be that.
But I suspect that may still be a long time in the future. All my CMU
professors in the 1970s said Fortran was dead then. However .. remember
that it still pays my salary and my company makes a ton of money building
hardware that runs Fortran codes - it's not even close when you look at
number one [check out: the application usage on one of the bigger HPC
sites in Europe -- I offer it because it's easy to find the data and the
graphics make it obvious what is happening:
https://www.archer.ac.uk/status/codes/ - other sites have similar stats,
but find them is harder].
Clem
** As my friend Russ Robeolen (who was the chief designer of the S/360
Model 50) tells the story, he says Amdahl was madder than a hornet about
it, but Brooks pulled rank and kicked him out of his office. The S/360 was
supposed to be an ASCII machine - Amdahl thought the extra bit for a byte was
a waste -- Brooks told him if it wasn't a power of 2, don't come back --
that is "if a byte was not a power of two he did not know how to program
for it efficiently and SW being efficient was more important that Amdahl's
HW implementation!" (imagine that). Amdahl did get a 24-bit word
type, but Brooks made him define it so that 32 bits stored
everything, which again Amdahl thought was a waste of HW. Bell would later
note that it was the single greatest design choice in the computer industry]
.
[-- Attachment #2: Type: text/html, Size: 6877 bytes --]

[-- Attachment #1: Type: text/plain, Size: 1424 bytes --]
On 2020-May-17 16:08:26 -0400, Clem Cole <clemc@ccc.com> wrote:
>On Sun, May 17, 2020 at 12:38 PM Paul Winalski <paul.winalski@gmail.com>
>wrote:
>
>> Well, the function in question is called getchar(). And although
>> these days "byte" is synonymous with "8 bits", historically it meant
>> "the number of bits needed to store a single character".
8-bit bytes, 32/64-bit "words" and 2's complement arithmetic have been
"standard" for so long that I suspect there are a significant number of
computing professionals who have never considered that there is any
alternative.
>Yep, I think that is the real crux of the issue. If you grew up with
>systems that used a 5, 6, or even a 7-bit byte; you have an appreciation of
>the difference.
I've used a 36-bit system that supported 6 or 9-bit bytes. IBM Stretch even
supported programmable character sizes.
>DEC was still sort of transitioning from word-oriented hardware (a lesson,
>Paul, you and I lived through being forgotten a few years later with
>Alpha);
The Alpha was byte addressed, it just didn't support byte operations on
memory (at least originally). That's different to word-oriented machines
that only supported word addresses. Supporting byte-wide writes at
arbitrary addresses adds a chunk of complexity to the CPU/cache interface
and most RISC architectures only supported word load/store operations.
--
Peter Jeremy
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 963 bytes --]

Paul Winalski <paul.winalski@gmail.com> wrote:
>
> Regarding a "mem" type, take a look at BLISS. The only data type that
> language has is the machine word.
BCPL and B were also word-based languages. The PDP-7 was a word-addressed
machine. If I understand the history correctly, the move to NB then C was
partly to make better use of the byte-addressed PDP11.
Tony.
--
f.anthony.n.finch <dot@dotat.at> http://dotat.at/
a fair, free and open society

Paul Winalski <paul.winalski@gmail.com> wrote:
>
> Why did the implementers of the Unix ABI for ARM decide to have char
> be unsigned? Was there an architectural reason for it?
The early ARM didn't have a sign-extended byte load instruction.
I learned C with the Norcroft ARM C compiler on the Acorn Archimedes in
1991/2ish. Norcroft C had quite a lot of unix flavour despite running on a
system that was not at all unixy. (I didn't get my hands on actual unix
until a couple of years later.) Acorn had a BSD port to the Archimedes
which I've never seen myself - the R260 was a pretty powerful system for
its time which I coveted from afar. I believe the 32 bit ARM ABI evolved
from the early 26 bit ABI on the Archimedes. (32 bit word, 26 bit address
space.)
http://chrisacorns.computinghistory.org.uk/RISCiXComputers.html
More recent versions of the instruction set have more features. I believe
the arm64 ABI uses signed char to match what everyone is used to. I still
think unsigned bytes are more sensible, but that's what I was taught at an
impressionable age...
Tony.
--
f.anthony.n.finch <dot@dotat.at> http://dotat.at/
Trafalgar: North 3 or 4, occasionally 5 later. Moderate, occasionally slight
in east. Fair. Good.

[-- Attachment #1: Type: text/plain, Size: 1235 bytes --]
On Mon, May 18, 2020 at 8:05 AM Tony Finch <dot@dotat.at> wrote:
> BCPL and B were also word-based languages.
Yes, that was the style of the systems language. IIRC PL/360 worked the
same way too.
> The PDP-7 was a word-addressed machine.
Correct.
> If I understand the history correctly, the move to NB then C was
> partly to make better use of the byte-addressed PDP11.
>
I never used NB, so you'll have to ask someone like Ken or Doug, as to when
the language became 'different enough' that Dennis felt it was time to
rename it. From conversations years ago with dmr, I was under the
impression the original additions were considered 'syntactic sugar ' at
first -- hints to help him generate better code for the PDP-11 (like
'register'). I think Steve was at Waterloo and still using B and when
he returned to MH, C had appeared, but he might be able to shed some light
on the transition.
Clearly the byte address behavior of the 11 had a heavy influence in C.
As I said in my earlier email, I've some times wonder what would have
happened to the language if the data units had been: byte, word, ptr only
[or if DEC marketing had not screwed up with how BLISS was released -
another story for COFF I suspect].
Clem
[-- Attachment #2: Type: text/html, Size: 2758 bytes --]

> [A]lthough these days "byte" is synonymous with "8 bits", historically it
> meant "the number of bits needed to store a single character".
It depends upon what you mean by "historically". Originally "byte"
was coined to refer to 8 bit addressable units on the IBM 7030 "Stretch"
computer. The term was perpetuated for the 360 family of computers. Only
later did people begin to attribute the meaning to non-addressable
6- or 9-bit units on 36- and 18-bit machines.
Viewed over history, the latter usage was transient and colloquial.
Doug

> On May 17, 2020, at 09:24, Paul Winalski <paul.winalski@gmail.com> wrote:
>
> ... The PDP-10 addressed memory in 36-bit units. The character manipulating
> instructions could deal with a variety of different byte lengths: you could
> store ... five ASCII 7-bit characters (with a bit left over) ...
IIRC, this format was called 5/7 IOPS ASCII. The PDP-7, 9, and 15 computers used
a variant of this format, but they had to start with a pair of (18-bit) words.
Around 1970, I wrote a pair of (assembly language) routines to extract and insert
characters, because our PDP-15 did NOT have character manipulating instructions.
-r

CDC NOS on the 6600 (I used a 70/74, actually) used 12 bits to store ASCII. Mostly, we used six bit display code. Since there was a printable character for every code, you could just look at the binary files. Ten characters to the word!
Brantley
> On May 18, 2020, at 11:13 AM, Rich Morin <rdm@cfcl.com> wrote:
>
>> On May 17, 2020, at 09:24, Paul Winalski <paul.winalski@gmail.com> wrote:
>>
>> ... The PDP-10 addressed memory in 36-bit units. The character manipulating
>> instructions could deal with a variety of different byte lengths: you could
>> store ... five ASCII 7-bit characters (with a bit left over) ...
>
> IIRC, this format was called 5/7 IOPS ASCII. The PDP-7, 9, and 15 computers used
> a variant of this format, but they had to start with a pair of (18-bit) words.
> Around 1970, I wrote a pair of (assembly language) routines to extract and insert
> characters, because our PDP-15 did NOT have character manipulating instructions.
>
> -r
>

[-- Attachment #1: Type: text/plain, Size: 2769 bytes --]
On Sun, May 17, 2020 at 12:24 PM Paul Winalski <paul.winalski@gmail.com>
wrote:
> On 5/16/20, Steffen Nurpmeso <steffen@sdaoden.eu> wrote:
> >
> > Why was there no byte or "mem" type?
>
> These days machine architecture has settled on the 8-bit byte as the
> unit for addressing, but it wasn't always the case. The PDP-10
> addressed memory in 36-bit units. The character manipulating
> instructions could deal with a variety of different byte lengths: you
> could store six 6-bit BCD characters per machine word,
Was this perhaps a typo for 9 4-bit BCD digits? I have heard that a reason
for the 36-bit word size of computers of that era was that the main
competition at the time was against mechanical calculator, which had
9-digit precision. 9*4=36, so 9 BCD digits could fit into a single word,
for parity with the competition.
6x6-bit data would certainly hold BAUDOT data, and I thought the Univac/CDC
machines supported a 6-bit character set? Does this live on in the Unisys
1100-series machines? I see some reference to FIELDATA online.
I feel like this might be drifting into COFF territory now; Cc'ing there.
or five ASCII
> 7-bit characters (with a bit left over), or four 8-bit characters
> (ASCII plus parity, with four bits left over), or four 9-bit
> characters.
>
> Regarding a "mem" type, take a look at BLISS. The only data type that
> language has is the machine word.
>
> > +getfield(buf)
> > +char buf[];
> > +{
> > + int j;
> > + char c;
> > +
> > + j = 0;
> > + while((c = buf[j] = getc(iobuf)) >= 0)
> > + if(c==':' || c=='\n') {
> > + buf[j] =0;
> > + return(1);
> > + } else
> > + j++;
> > + return(0);
> > +}
> >
> > so here the EOF was different and char was signed 7-bit it seems.
>
> That makes perfect sense if you're dealing with ASCII, which is a
> 7-bit character set.
To bring it back slightly to Unix, when Mary Ann and I were playing around
with First Edition on the emulated PDP-7 at LCM+L during the Unix50 event
last USENIX, I have a vague recollection that the B routine for reading a
character from stdin was either `getchar` or `getc`. I had some impression
that this did some magic necessary to extract a character from half of an
18-bit word (maybe it just zeroed the upper half of a word or something).
If I had to guess, I imagine that the coincidence between "character" and
"byte" in C is a quirk of this history, as opposed to any special hidden
meaning regarding textual vs binary data, particularly since Unix makes no
real distinction between the two: files are just unstructured bags of
bytes, they're called 'char' because that was just the way things had
always been.
- Dan C.
[-- Attachment #2: Type: text/html, Size: 3710 bytes --]

[-- Attachment #1: Type: text/plain, Size: 2801 bytes --]
No typo. While BCD was a way of encoding digits, BCD was also used as a character encoding. Often these were outgrowths of the digit+zone punch encoding of IBM cards. IBM later extended their BCD making into…. The EXTENDED Binary Coded Decimal Interchange Code, going from 6 to 8 bits in the process.l
UNIVAC indeed have their own BCD-sih format called FIELDDATA. It was notable in that the nul value printed as @.
the PDP-10 and the UNVAC 1100 series were just the longest surviving perhaps of the 36 bit computers, that also included the IBM 70XX series and the GE 600 (Honewell 6000) series. Both the UNIVAC and the PDP-10 did have the nice variable partial word mode, but all of these were indeed word addressed machines.
The early Crays also were word addressed. The C compiler would simulated byte addressing by putting the byte offsetinto the word in the high order bits (the address resgisters themselves were pinned at 24 bits).
Just to get this back on a UNIX history track, let me delve into more trivia.
Perhaps the real oddity was the Denelcor HEP. The HEP had two addressing modes: One was byte addressed (as you expect), the other was for all other data thpes (16-bit, 32-bit, and 64-bit portions of the 64-bit word). The lower 3 bits of the memory address encoded the word side. If it was 0 or 4, then it a 64 bit operand at the address specified in the higher part of the pointer. If it was 2 or 6, then it was either the upper or lower half word. If it was 1,3,5, or 7, it was one of the respective quarter words.
This caused a problem when we ported 4BSD to the thing. The Berkeley kernel (particularly in the I/O code) did what I called “conversion by union.” They would store a value in a union using one type pointer and the retrieve it via a different type. In our compiler, had they used a cast (or went via a void* intermediary), everything would be fine. But doing this sort of shenanigan (which is technically undefined behavior in C) led to insidious bugs where you’d be doing pointer operations and the WRONG size word would be referenced.
I spent a few days hunting all these down and fixing them.
It was about this time I realized that the code was setting up I/Os using a feature aptly named “The Low Speed Bus” and that we’d never get any reasonable performance that way. HEP designer Burton Smith and I redesigned the I/O system literally on napkins from the Golden Corral in Aberdeen. We went back and built a new I/O system out of spare parts we had on hand using an 11/34 as a control processor. The HEP I/O system was kind of interesting in that while it had a highspeed interface into he HEP’s ECL memory, the thing consisted of 32 individual DEC UNIBUSes.
[-- Attachment #2: Type: text/html, Size: 4227 bytes --]

I should have checked my 7030 manual before asserting
that the 8-bit byte came from there. The term did,
but it meant an addressable unit of 1 to 8 bits
depending on the instruction being executed.
[The machine was addressable to the bit. It also
had all 16 bitwise logical operators, and
maintained counts of the 1 bits and leading
0 bits in a register. And it was BIG. I saw
one with 17 memory boxes (each essentially
identical with the total memory of a 7090)
stretched across the immaculate hardwood
floor of IBM's Poughkeepsie plant.]
Doug

On Mon, 18 May 2020, Doug McIlroy wrote:
> I should have checked my 7030 manual before asserting
> that the 8-bit byte came from there. The term did,
> but it meant an addressable unit of 1 to 8 bits
> depending on the instruction being executed.
>
> [The machine was addressable to the bit. It also
> had all 16 bitwise logical operators, and
> maintained counts of the 1 bits and leading
> 0 bits in a register. And it was BIG. I saw
> one with 17 memory boxes (each essentially
> identical with the total memory of a 7090)
> stretched across the immaculate hardwood
> floor of IBM's Poughkeepsie plant.]
>
> Doug
>
I used to go by that plant all the time, because I lived in Staatsburg two
towns north of Poughkeepsie for 3 years.
-uso.

On Mon, 18 May 2020, Peter Jeremy wrote:
> 8-bit bytes, 32/64-bit "words" and 2's complement arithmetic have been
> "standard" for so long that I suspect there are a significant number of
> computing professionals who have never considered that there is any
> alternative.
You haven't lived until you've dealt with a 1's-complement machine i.e. -0
!= 0 ... To be fair, it was *mostly* normalised.
>> Yep, I think that is the real crux of the issue. If you grew up with
>> systems that used a 5, 6, or even a 7-bit byte; you have an
>> appreciation of the difference.
>
> I've used a 36-bit system that supported 6 or 9-bit bytes. IBM Stretch
> even supported programmable character sizes.
Ever tried a Univac or a Honeywell? I don't remember the exact details,
and I prefer to keep it that way...
> The Alpha was byte addressed, it just didn't support byte operations on
> memory (at least originally). That's different to word-oriented
> machines that only supported word addresses. Supporting byte-wide
> writes at arbitrary addresses adds a chunk of complexity to the
> CPU/cache interface and most RISC architectures only supported word
> load/store operations.
I had to support an old Alpha once; that was one of the reasons why I was
happy to leave the joint. We had just one customer who used an Alpha, and
thus we/I had to maintain the thing.
And don't even ask me about HP-UX (just as well that they weren't called
Packard-Hewlett), nor Xenix, nor early Slowaris, nor National Cash
Registers, nor...
Excuse me, I now have to take my sleepy pills :-)
-- Dave

There was a recent message I now can't find that I wanted to reply to,
something about which type to use to get a certain effect.
I wanted to reply to say that I felt that it was not really the best way to
go, to have one set of type names that tried to denote both i) the semantics
of the data, and ii) the size of the item, using arbitrary names.
This came up for me when we started to try and write portable networking code.
There, you need to be able to specify very precisely how long fields are
(well, in lower-level protocols, which use non-printable formats). How to do
that in a way that was portable, in the compilers of the day, was a real
struggle. (It might be doable now, but I think the fixes that allowed it were
still just patches to something that had gone in the wrong direction, above.)
I created a series of macros for type definitions, ones that separately and
explicitly specified the semantics and size. They looked like 'xxxy', where
'xxx' was the semantics (signed and unsigned integers, bit field, etc),
and 'y' was a length indication (byte, short, long, and others). So you'd
see things like 'unsb' and 'intl'.
The interesting twist was a couple of unusual length specifiers; among them,
'w' stood for 'the machine's natural word length', and 'f' meant 'no
particular length, just whatever's fastest on this architecture/compiler, and
at least 16 bits'. The former was useful in OSy type code; the latter for
locals and things where nobody outside the machine would see them.
Then you'd have to have a file of macro definitions (only one per machine)
which translated them all into the local architecture/compiler - some didn't
go, of course (no 'unsb' on a PDP-11), but it all worked really, really well,
for many years.
E.g. at one point, as a dare/hack, I said I'd move the MOS operating system, a
version written in portable C (with that type name system) to the AMD 29000
over one night. This wasn't totaly crazy; I'd already gotten the debugger (a
DDT written in similar portable C) to run on the machine, so I knew where the
potholes were. I'd have to write a small amount of machine language (which I
could traslate from the M68K version), but most of it should just compile and
go. I didn't quite make it, it wasn't quite running when people started coming
in the next morning; but IIRC it started to work later that day.
Noel

[-- Attachment #1: Type: text/plain, Size: 1585 bytes --]
On Sat, May 16, 2020 at 11:39:54AM -0600, Warner Losh wrote:
> On Sat, May 16, 2020 at 10:28 AM Paul Winalski <paul.winalski@gmail.com>
> wrote:
>
> > > On Fri, May 15, 2020 at 4:02 PM <ron@ronnatalie.com> wrote:
> > >
> > >Unfortunately, if c is char on a machine with unsigned chars, or it’s of
> > >type unsigned char, the EOF will never be detected.
> > >
> > > - while ((c = getchar()) != EOF) if (c == '\n') { /* entire record is
> > now there */
> >
> > The function prototype for getchar() is: int getchar(void);
> >
> > It returns an int, not a char. In all likelihood this is specifically
> > *because* EOF is defined as -1. The above code works fine if c is an
> > int. One always has to be very careful when doing a typecast of a
> > function return value.
> >
>
> In the early days of my involvement with FreeBSD, I went through and fixed
> about a dozen cases where getopt was being assigned to a char and then
> compared with EOF. I'm certain that this is why. Also EOF has to be a value
> that's not representable by a character, or your 0xff bytes would disappear.
I think I remember a code review on one of my patches to du(1), I think,
something about adding an option to ignore specific names when
recursing, and I remember either you or BDE chastising me about
while (ch = getopt(...), ch != EOF) :)
G'luck,
Peter
--
Peter Pentchev roam@ringlet.net roam@debian.org pp@storpool.com
PGP key: http://people.FreeBSD.org/~roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

On a vaguely-related note, for some years I've been using the phrase
"algebraic syntax" to characterize languages such as Algol, C/C++,
Fortran, Java(Script), Ruby, etc. Contrary examples might include
Assembler, COBOL, Forth, Lisp, RPG, etc.
However, I can't find this usage in Wikipedia or elsewhere in the
Intertubes. Am I simply confused? Is there a better term to use?
Inquiring gnomes need to mine...
-r

On Tue, 19 May 2020, Rich Morin wrote:
> On a vaguely-related note, for some years I've been using the phrase
> "algebraic syntax" to characterize languages such as Algol, C/C++,
> Fortran, Java(Script), Ruby, etc. Contrary examples might include
> Assembler, COBOL, Forth, Lisp, RPG, etc.
My benchmark is "Can it be described in BNF?" LISP, for example, would be
something like:
phrase: "(" phrase ")"
> However, I can't find this usage in Wikipedia or elsewhere in the
> Intertubes. Am I simply confused? Is there a better term to use?
> Inquiring gnomes need to mine...
You are confused because you are relying upon Wikipedia :-) Well, someone
had to say it, so it may as well be me; as I keep saying, it's only as
accurate as the last idiot who updated it.
-- Dave, a Wikipedia editor