It's a bit subjective, but I'm hoping to get a clearer understanding of what factors make an operator clear to use vs obtuse and difficult. I've been considering language designs recently, and one issue that I always circle back around on is when to make some key operation in the language an operator, and when to use a keyword or function.

Haskell is somewhat notorious for this, as custom operators are easy to create and often a new data type will come packaged with several operators for use on it. The Parsec library, for example, comes with a ton of operators for combining parsers together, with gems like >. and .> I can't even recall what they mean right now, but I do remember them being very easy to work with once I had memorized what they actually mean. Would a function call such as leftCompose(parser1, parser2) have been better? Certainly more verbose, but clearer in some ways.

Operator overloads in C-like languages are a similar issue, but conflated by the additional problem of overloading the meaning of familiar operators like + with unusual new meanings.

In any new language, this would seem like a pretty tough issue. In F#, for example, casting uses a mathematically derived type-casting operator, instead of the C# style cast syntax or the verbose VB style. C#: (int32) x VB: CType(x, int32) F#: x :> int32

In theory a new language could have operators for most built-in functionality. Instead of def or dec or var for variable declaration, why not ! name or @ name or something similar. It certainly shortens up declaration followed by binding: @x := 5 instead of declare x = 5 or let x = 5 Most code is going to require a lot of variable definitions, so why not?

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
If this question can be reworded to fit the rules in the help center, please edit the question.

I wish one of the 1,000 developers who have ever complained to me about Java's lack of operator overloading would answer this question.
–
smp7dMar 15 '12 at 16:45

1

I had thought of APL as I finished typing the question, but in some ways that clarifies the point and moves it into somewhat less subjective territory. I think most people could agree that APL loses something in ease-of-use through its extreme number of operators, but the amount of people who complain when a language doesn't have operator overloading surely speaks in favor of some functions being well suited to act as operators. Between forbidden custom operators and nothing but operators, there must hide some guidelines for practical implementation and usage.
–
CodexArcanumMar 15 '12 at 17:15

As I see it, lack of operator overloading is objectionable because it privileges native types over user-defined types (note that Java does this in other ways as well). I am much more ambivalent about Haskell-style custom operators, which seem like an open invitation to trouble...
–
comingstormMar 15 '12 at 18:10

2

@comingstorm: I actually think the Haskell way is better. When you have a finite set of operators that you can overload in different ways, you're often forced to reuse operators in different contexts (e.g. + for string concatenation or << for streams). With Haskell, on the other hand, an operator is either just a function with a custom name (that is, not overloaded) or part of a type class which means that while it's polymorphic, it does the same logical thing for every type, and even has the same type signature. So >> is >> for every type and is never going to be a bit shift.
–
Tikhon JelvisMar 16 '12 at 9:19

7 Answers
7

from a general language design point of view there need not be any difference between functions and operators. One could describe functions as prefix operations with any (or even variable) arity. And keywords can be seen as just functions or operators with reserved names (which is useful in designing a grammar).

All of these decision eventually come down to how do you want the notation to read and that as you say is subjective, though one can make some obvious rationalisations e.g. use the usual infix operators for maths as everyone knows them

To me, an operator stops being useful when you can no longer read the line of code out loud or in your head, and have it make sense.

For example, declare x = 5 reads as "declare x equals 5" or let x = 5 can be read as "let x equal 5", which is very understandable when read out loud, however reading @x := 5 is read as "at x colon equals 5" (or "at x is defined to be 5" if you're a mathematician), which doesn't make sense at all.

So in my opinion, use an operator if the code can be read out loud and understood by a reasonably intelligent person not familiar with the language, but use function names if not.

I don't think @x := 5 is hard to figure out - I still read it aloud to myself as set x to be equal to 5.
–
FrustratedWithFormsDesignerMar 15 '12 at 16:21

3

@Rachel, in that case, := has a broader use in mathematical notation outside of programming languages. Also, statements like x = x + 1 become a bit absurd.
–
ccoakleyMar 15 '12 at 16:58

3

Ironically, I had forgotten that some people wouldn't recognize := as assignment. A few languages actually do use that. Smalltalk, I think, being maybe the best known. Whether assignment and equality should be separate operators is a whole different can of worms, one probably covered by another question already.
–
CodexArcanumMar 15 '12 at 17:12

1

@ccoakley Thanks, I wasn't aware that := was used in mathematics. The one definition I found for it was "is defined to be" so @x := 5 would be translated in english to "at x is defined to be 5", which to me still doesn't make as much sense as it should.
–
RachelMar 15 '12 at 17:23

1

Certainly Pascal used := as assignment, for the lack of a left-arrow in ASCII.
–
VatineMar 15 '12 at 17:54

An operator is clear and useful when it is familiar. And that probably means that overloading operators should be done only when the chosen operator is near enough (as in form, precedence and associativity) as an already established practice.

But digging a little further, there are two aspects.

The syntax (infix for operator as opposed to prefix for function call syntax) and the naming.

For the syntax, there is another case where infix are enough clearer than prefix that it may warrant the effort becoming familiar: when calls are chained and nested.

Compare a * b + c * d + e * f with +(+(*(a, b), *(c, d)), *(e, f)) or + + * a b * c d * e f (both are parsable in the same condition as the infix version). The fact that the + separate the terms instead of preceding one of them by a long way make it more readable in my eyes (but you have to remember precedence rules which aren't needed for prefix syntax). If you need to compound things in this way, the long term benefit way be worth the learning cost. And you keep this advantage even if you name your operators with letters instead of using symbols.

Concerning the naming of operators, I see few advantages in using something else than established symbols or clear name. Yes they may be shorter, but they are really cryptic and will be quickly forgotten if you don't have a good reason to already know them.

A third aspect if you take a language design POV, is if the set of operators should be open or closed -- can you add operator or not --, and if the priority and associativity should be user specifiable or not. My first take would be to be careful and not provide user specifiable priority and associativity, while I'd be more open on having an open set (it would increase the push to use operator overloading, but would decrease the one to use a bad one just to benefit from the infix syntax where it is useful).

If users can add operators, why not let them set the precedence and associativity of the operators they've added?
–
Tikhon JelvisMar 16 '12 at 9:30

@TikhonJelvis, It was mostly conservativeness, but there are some aspects to consider if you want to be able to use it for anything else than examples. For instance, you want to bind the priority and associativity to the operator, not a given member of the overloaded set (you chose the member after the parsing is done, the choice can't influence the parsing). Thus to integrate libraries using the same operator, they must agree on its priority and associativity. Before adding the possibility, I'd try to find out why so few languages followed Algol 68 (AFAIK none) and allowed to define them.
–
AProgrammerMar 16 '12 at 10:20

Well, Haskell lets you define arbitrary operators and set their precedence and associativity. The difference is that you can't overload operators per se--they behave like normal functions. This means you can't have different libraries trying to use the same operator for different things. Since you can define your own operator, there's much less reason to reuse the same ones for different things.
–
Tikhon JelvisMar 17 '12 at 0:47

Operators-as-symbols are useful when they intuitively make sense. + and - are obviously addition and subtraction, for example, which is why pretty much every language ever uses them as their operators. Same with comparison (< and > are taught in grade school, and <= and >= are intuitive extensions of the basic comparisons when you understand that there's no underlined-comparison keys on the standard keyboard.)

* and / are less immediately obvious, but they're used universally and their position on the numeric keypad right next to the + and - keys helps provide context. C's << and >> are in the same category: when you understand what a left-shift and a right-shift are, it's not too difficult to understand the mnemonics behind the arrows.

Then you come to some of the truly strange ones, stuff that can turn C code into unreadable operator soup. (Picking on C here because it's a very operator-heavy example whose syntax pretty much everyone is familiar with, through either working with C or with descendant languages.) For example, then % operator. Everyone knows what that is: it's a percent sign... right? Except in C, it has nothing to do with division by 100; it's the modulus operator. That makes zero sense mnemonically, but there it is!

Even worse are the boolean operators. & as an and makes sense, except that there are two different & operators which do two different things, and both are syntactically valid in any case where one of them is valid, even though only one of them ever actually makes sense in any given case. That's just a recipe for confusion! The or, xor and not operators are even worse, since they don't use mnemonically useful symbols, and aren't consistent either. (No double-xor operator, and the logical and bitwise not use two different symbols instead of one symbol and a doubled-up version.)

And just to make it worse, the & and * operators get reused for completely different things, which makes the language harder for both people and compilers to parse. (What does A * B mean? It depends entirely on the context: is A a type or a variable?)

Admittedly, I never spoke with Dennis Ritchie and asked him about the design decisions he made in the language, but so much of it feels like the philosophy behind operators was "symbols for the sake of symbols."

Contrast this with Pascal, which has the same operators as C, but a different philosophy in representing them: operators should be intuitive and easy to read. The arithmetic operators are the same. The modulus operator is the word mod, because there's no symbol on the keyboard that obviously means "modulus". The logical operators are and, or, xor and not, and there's only one of each; the compiler knows whether you need the boolean or bitwise version based on whether you're operating on booleans or numbers, eliminating a whole class of errors. This makes Pascal code far easier to understand than C code, especially in a modern editor with syntax highlighting so the operators and keywords are visually distinct from identifiers.

Pascal doesn't represent a different philosophy at all. If you read Wirth's papers, he used symbols for things like and and or all the time. When he developed Pascal, however, he did it on a Control Data mainframe, which used 6-bit characters. That forced the decision to use words for many things, for the simple reason that the character set simply didn't have nearly as much space for symbols and such as something like ASCII. I've used both C and Pascal, and find C substantially more readable. You may prefer Pascal, but that's a matter of taste, not fact.
–
Jerry CoffinMar 15 '12 at 17:33

To add to @JerryCoffin remark about Wirth, its subsequent languages (Modula, Oberon) are using more symbols.
–
AProgrammerMar 15 '12 at 18:28

@AProgrammer: And they never went anywhere. ;)
–
Mason WheelerMar 15 '12 at 18:46

I think | (and, by extension, ||) for or makes sense. You also see it all the time for alternation in other contexts, like grammars, and it looks like it's splitting two options.
–
Tikhon JelvisMar 15 '12 at 20:21

1

An advantage of letter-based keywords is that the space of them is much larger than symbols. For example, a Pascal-style language can include operators for both "modulus" and "remainder", and for integer versus floating-point division (which have different meanings beyond the types of their operands).
–
supercatFeb 24 '14 at 23:45

I think operators are much more readable when their function closely mirrors their familiar, mathematical purpose. For example, standard operators such as +, -, etc. are more readable than Add or Subtract when performing addition and subtraction. The operator's behavior must be clearly defined. I can tolerate an overload of the meaning of + for list concatenation, for example. Ideally the operation would not have side effects, returning a value instead of mutating.

I've struggled with shortcut operators for functions like fold, though. Obviously more experience with them would ease this, but I find foldRight more readable than /: .

Maybe frequency of use comes into it as well. I wouldn't think you fold often enough that an operator saves a whole lot of time. That might also be why the LISPy (+ 1 2 3) seems odd but (add 1 2 3) is less so. We tend to think of operators as strictly binary. Parsec's >>. style operators may be goofy, but at least the primary thing you do in Parsec is combine parsers, so operators for it do get a lot of use.
–
CodexArcanumMar 15 '12 at 17:20

The problem with operators is that there's only a small number of them compared to the number of sensical (!) method names that could be used instead. A side effect is that user-definable operators tend to introduce a lot of overloading.

Certainly, the line C = A * x + b * c is easier to write and read than C = A.multiplyVector(x).addVector(b.multiplyScalar(c)).

Or, well, it's certainly easier to write, if you've got all overloaded versions of all those operators in your head. And you can read it afterwards, while fixing the second-to-last bug.

Now, for most code that's going to be running out there, that's fine. "The project passes all tests, and it runs" - for many a piece of software that's all we could ever want.

But things work out differently when you're doing software that goes into critical areas. Safety critical stuff. Security. Software that keeps planes in the air, or that keeps nuclear facilities running. Or software that encrypts your highly confidential emails.

For security experts that specialize in auditing code, this can be a nightmare. The mix of an abundance of user-defined operators and operator overloading can leave them in the unpleasant situation that they have a hard time figuring out what code is going to run eventually.

So, as stated in the original question, this is a highly subjective subject. And while many a suggestion about how operators should be used might sound perfectly sensical, the combination of only a few of them might create a lot of trouble in the long run.

I suppose the most extreme example is APL the following program is the "game of life":

life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}

And if you can work out what it all means without spending a couple of days with the reference manual then good luck to you!

The problem is you have a set of symbols that nearly everyone intuitively understands: +,-,*,/,%,=,==,&

and the rest which require some explanation before they can be understood, and, are generally specific to the particular language. For instance there is no obvious symbol to specify which member of an array you want, [],() and even "." have been used, but equally there is no obvious keyword you could easily use, so there is a case to be made for judicious use of operators. But not too many of them.