2017-05-20

On scripting languages

Michael Belivanakis 2017

Note: this is a first draft. It will be heavily edited. It may contain statements that are inaccurate or just plain wrong. It may also contain language that is inappropriate. There are bound to be corrections after I receive some feedback.

Historically, the difference between scripting languages and "real" programming languages has been thought of as being the presence or absence of a compilation step. However, from time to time we have seen interpreters for compiled languages, and we have also seen compilers for languages that were thought of as scripting languages. Furthermore, some scripting engines today internally compile to bytecode, and some even to machine code, while many compiled languages are compiled to bytecode instead of machine code, and this bytecode is at times interpreted. So, compiled vs. interpreted does seem to be the real differentiating factor between real programming languages and scripting languages. Nonetheless, we can usually tell a scripting language when we see one. So, what is it that we see?

I would like to suggest that the actual differentiating factor between scripting languages and real programming languages is nothing but the presence or absence of strong typing. In other words, it boils down to presence or absence of semantic checking. The seemingly coincidental trend of strongly typed languages to be compiled, and of weakly typed languages to be interpreted, can be explained in full as a consequence of the primary choice of strong vs. weak typing:

If a language is strongly typed then it may contain detectable semantic errors, so a compilation step is very useful because it will unfailingly locate all the semantic errors that would otherwise only be detected at runtime.

On the other hand, if a language is weakly typed then the need to parse all of the code in advance is lessened, because the only errors that such parsing could possibly reveal would be syntactic ones.

And yet, people tend to like scripting languages, and tend to actually write lots of code in them, supposedly because they are "easier". This immediately brings to my mind the famous quote by Edsger W. Dijkstra, taken from a different context, but equally applicable to the situation at hand:

[...] some people found error messages they couldn't ignore more annoying than wrong results, and, when judging the relative merits of programming languages, some still seem to equate "the ease of programming" with the ease of making undetected mistakes.

Arguments I hear in favor of scripting languages

Argument: It is easy to write code in it; look, the "hello, world!" program is a one-liner.

Rebuttal: What this means is that this scripting language is a very good choice for writing the "hello, world!" program. The ease with which you may write "hello, world!" is no indication whatsoever about the ease with which a non-trivial system may be developed, tested, debugged, maintained, and extended. On the contrary, a scripting language which makes it possible for you to write "hello, world!" in a single line achieves this by introducing a few trade-offs; it offers built-in functionality without the need to explicitly import it, which in turn means that there are identifiers always in scope even when not needed; it does not require code to be placed in classes, which means that it is either not object-oriented, or it mixes paradigms; and it does not require code to be placed in functions, which means that either its syntax is trivial, or again, it mixes paradigms. The moment you write anything non-trivial, you will of course need to be able to import namespaces, and to put everything in classes and methods, so the fact that the language does not require them buys you close to nothing.

Argument: No, I mean it is really terse. There are many things besides "hello, world!" that I can write in one line.

Rebuttal: Sure, you can write them in one line. But can you read them? Terseness appears to be the modern trend, so as real programming languages keep evolving they are also receiving features that make more and more things possible in one line. Take lambdas and the fluent style of invocations for example. However, this is always at the expense of readability and debuggability. So, terseness is not the exclusive domain of scripting languages anymore, and to the extent that scripting languages fare better in this domain it is debatable whether it is an advantage or a disadvantage.

Argument: There are lots of libraries for it.

Rebuttal: Seriously? There are more libraries for your scripting language than there are for java?

Argument: I don't have to compile it; I just write my code and run it.

Rebuttal: I also just write my code and run it. When I hit the "launch" button, my IDE compiles my code in the blink of an eye and runs it. The difference between you and me is that if I have made any errors, I am told so before wasting my time running it. But what am I saying, being told that there are errors in your code probably counts as a disadvantage for you, right?

Argument: I am not worried about errors, because I use testing.

Rebuttal: Testing is an indispensable quality assurance mechanism for software, but it does not, in and by itself, guarantee correctness. It is too custom-made, too subjective, and too fragmentary. You can easily forget to test something, you can easily test the wrong thing, and you can easily test "around" a bug, accidentally creating tests that pretty much require the bug to be in place in order to pass. Despite these deficiencies, testing is still very important, but it is nothing more than a weapon in our arsenal against bugs. This arsenal includes another weapon, which is closer to the forefront of the battle against bugs than testing is, and it is comprehensive, generic, 100% objective, and definitive. This weapon is called strong typing. It is also nothing but just another weapon, but it has so far been considered as fundamental and indispensable. Alas, this hard won realization from times of yore seems to be lost in the modern generation of programmers, who think they are going to re-invent everything.

Argument: It has lots and lots of built-in features.

Rebuttal: Sure, and that's why scripting languages are not entirely useless. If the only thing that matters is to accomplish a certain highly self-contained goal of severely limited scope in as little time as possible, then please, by all means, do go ahead and use your favorite scripting language with its awesome built-in features. However, if the project is bound to take a life of its own, you are far better off investing a couple of minutes to create a project in a real programming language, and to include the external libraries that will give you the same functionality in that language. Built-in features do not only come with benefits; in contrast to libraries, they are much more difficult to evolve, because even a minute change in them may break existing code. Also, built-in features usually have to be supported forever, even after better alternatives have been invented, or after they simply go out of style, so over time scripting languages tend to gather unnecessary baggage.

Argument: But really, it is so much easier! Look here, in one statement I obtain a list and assign its elements to individual variables!

Rebuttal: That's great, I bet this has slashed your time to market by half. Seriously, my compiled language of choice has its own unique, arcane, hacky syntax quirks that I could, if I wanted to, claim that they make things so much easier for me. Some of them are not even that arcane. For example, instead of having to add comments within a method about each one of the typeless arguments that it accepts, explaining what the actual type of the argument is, so that the IDE can parse those comments and provide me with some rudimentary argument type documentation and checking, I get to simply declare the type of each argument together with the argument, as part of the syntax of the language! Imagine that!

Argument: It is trendy. It is hip.

No contest here. I can't argue with hipsters.

The lack of semantic checking

Lack of semantic checking means that errors can be made, which will not be caught at the earliest moment possible, which is during compilation, or better yet, during editing in any decent IDE. Therefore, lack of semantic checking means that errors can be made more easily, which in turn inescapably means that there will be a somewhat increased number of bugs that will go undetected until production. This, by itself, is enough to classify scripting languages as unsuitable for everything but the most trivial usage, and the debate should be over right there; we should not need to say anything more.

But here is more, for the sake of the exercise.

Lack of semantic checking means that your IDE cannot provide you with many useful features that you get with strongly typed languages. Specifically, you either have limited functionality, or you do not have at all, some or all of the following features:

Context-sensitive auto-completion. Since any parameter to any function can be of any type, the IDE usually has no clue as to which of the variables in scope may be passed as a parameter to a function and which may not. Therefore, it cannot be smart about suggesting what to auto-complete, and it has to suggest either everything that is in scope, or nothing at all.

Member Auto-completion. Since any variable can be of any type, the IDE usually has no clue as to what member fields and functions are exposed by any given variable. Therefore, it cannot suggest anything.

Find all references. Since any variable can be of any type, the IDE usually has no clue as to where a given type is used, or if it is used at all. This in turn means that when you are looking for usages of some type you have to resort to text search, which is a sub-optimal solution. Text search requires constant fiddling with search options like whole word vs. any part of word, case sensitive vs. insensitive, current folder tree vs. whole project (if there is even such a notion,) etc. and despite all the fiddling, it still usually includes irrelevant synonyms in the search results. Furthermore, text search is only useful when you already know what you are looking for, and you explicitly set out to look for it in particular. Contrast this with strongly typed languages where the IDE knows at any given moment all the locations where every single one of your identifiers are used, keeps giving you visual clues about them, (including visual clues about identifiers that are unused,) and can very accurately list all references with a single click.

Refactoring. When the IDE has no knowledge of the semantics of your code, it cannot perform any refactoring on it. IDEs that offer refactoring features on untyped languages are actually faking it; they should not be calling it refactoring, they should be calling it cunning search and replace. And needless to say, a) it is not always correct, and b) in the event that it will severely mess up your code, you will have no way of knowing until you run the code, because remember, there is no semantic checking.

The horrible syntax

Most scripting languages suffer from a severe case of capriciously arcane and miserably grotesque syntax. No, beauty is not in the eye of the beholder; if you think that the issue of PHP aesthetics is a subjective one, you should seek help from a qualified professional. The syntax of scripting languages tends to suffer either because their priorities are all wrong by design, or because they were hacked together in a weekend without too much thought, or simply due to plain incompetence on behalf of their creators.

Scripting languages that have their priorities wrong are, for example, all the shell scripting languages. Their priorities are wrong by design, because they aim to make strings (filenames) look and feel as if they are identifiers, so that you can type commands without having to enclose them in quotes, as if this convenience was the most important thing ever. Actually, it would have been absolutely fine to offer this convenience if all we ever wanted to do with these scripts was to just list sequences of programs to execute, but the moment we need to use any actual programming constructs, what we have in our hands is a string escaping nightmare of epic proportions.

A scripting language that owes its bad syntax to being hastily hacked together is JavaScript. Brendan Eich, its creator, has admitted that the prototype of JavaScript was developed in 10 days, and that the language was never meant for anything but short snippets. He is honest enough to speak of his own creation in derogatory terms, and to accept blame. (See TEDxVienna 2016, opening statement, "Hello, I am to blame for Javascript".) He is now involved with WebAssembly, aiming to replace JavaScript with something completely new and completely different. Also, pretty much anyone deeply involved with JavaScript will admit that it has serious problems. One of the most highly acclaimed books on the language is Douglas Crockford's JavaScript: The Good Parts by O'Reilly. You can take the title of the book as a hint.

A scripting language that owes its horrific syntax to lack of competence on behalf of its creator is PHP. Rasmus Lerdorf, its creator, is quoted on the Wikipedia article about PHP as saying "I don’t know how to stop it, there was never any intent to write a programming language […] I have absolutely no idea how to write a programming language, I just kept adding the next logical step on the way."

So, from the above it should be obvious that most scripting languages are little toy projects that were created by hackers who simply wanted to prove to themselves that they could actually build something like that, without intending them to be used outside their own workbench.

The lack of semantic checking in scripting languages is usually not a conscious choice, but a consequence of the very limited effort that usually goes into creating them. The fact that some of them catch on and spread like wildfire simply shows how eager the industry is to adopt any contemptible piece of nonsense without any critical thinking whatsoever, as long as it helps solve some immediate problem at hand. It is a truly deplorable situation that kids nowadays learn JavaScript as their first programming language because it is so accessible to them: all you need to have is a browser, and one day instead of F11 for full-screen you accidentally hit F12 which opens up the developer tools, and you realize that you have an entire development environment sitting right there. The availability of JavaScript to small children is truly frightening.

Usually, once a language becomes extremely popular, tools are added that try to lessen the impact of its deficiencies, so today it is possible to have some resemblance of semantic checking when programming in Python or in Javascript, but since the checking has been added as an afterthought, it is always partial, unreliable, hacky, and generally an uphill battle.

The nonsense

I don't need to say much here, just watch "Wat" by Gary Bernhardt from CodeMash 2012, it is only 4 minutes long: https://www.destroyallsoftware.com/talks/wat

The reason for all this nonsense is that these languages are hacks.

When the foundation that you are working on is a hack, then either anything you will build upon it will in turn be a hack, or you are going to be putting an enormous effort to circumvent the hackiness of the foundation and build something reasonable over it. Why handicap yourself?

That little performance issue

Performance is nearly not an issue, mostly because scripting languages tend to be used in situations where performance is not required, while in the rare cases where performance is necessary, external libraries can be used. (And there are of course some odd cases where performance is of concern, and yet a scripting language is chosen, and they do in fact suffer horrendous performance consequences, take node.js for example.) This is important to state real quick before moving on, so as to be clear about it: on computationally expensive tasks, such as iterating over all color values of an image to manipulate each one of them, there is no way that a scripting language will perform anywhere close to java, just as there is no way that java will perform anywhere close to C++. Stop arguing about this.

What scripting languages are good for

Scripting languages are useful when embedded within more complex applications written in real programming languages, mainly as evaluators of user-supplied expressions, or, in the worst case, as executors of user-supplied code snippets.

Scripting languages are useful when shortening the development time from first opening the editor to the first run of the program is far more important than anything else. Under "anything else" we really include everything else: performance, understandability, maintainability, testability, everything, even correctness.

Scripting languages are useful when the program is so trivial, and its expected lifetime is so short, that it is hardly worth the effort of creating a new folder with a new project file in it. The corollary of this is that if it is worth creating a project for it, then it is worth using a real programming language.

Scripting languages are useful when the code to be written is so simple that bugs can be easily detected by simply skimming through the code. The corollary of this is that if the program is to be even slightly complex, it should be written in a real programming language. (Adding insult to injury, scripting languages tend to have such capricious write-only syntax that it is very hard to grasp what any given line of code does, let alone vouch for it being bug-free.)

Conclusion

So, you might ask, what about the hundreds of thousands of successful projects written in scripting languages? Are they all junk? Do they represent a massive waste of time? And what about the hundreds of thousands of programmers all over the world who are making extensive use of scripting languages every day and are happy with them? Are they all misguided? Can't they see all these problems? Are they all ensnared in a monstrous collective delusion?