Summary
Will programming languages ever reach the expressive power and human readability of natural languages?

Advertisement

In one of his recent articles LISP jock Paul Graham takes a look at the future of programming languages and gives his view on the answer to the question what "kind of programming language [we] will use to write the software controlling [the] flying cars". What follows is an interesting explanation of his notion that "like species, languages will form evolutionary trees, with dead-ends branching off all over". As a LISP adept he, of course, predicts that Java is one of those dead branches, but that aside. The interesting thing is that at the same time a friend of mine send me the following piece of text, please try to read it (I can't find the origin but you can easily type a similar piece yourself, for fun, try communicating with somebody you know without using vowels):

"... randomising letters in the middle of words [has] little or no effect on the ability of skilled readers to understand the text. This is easy to denmtrasote. In a pubiltacion of New Scnieitst you could ramdinose all the letetrs, keipeng the first two and last two the same, and reibadailty would hadrly be aftcfeed. My ansaylis did not come to much beucase the thoery at the time was for shape and senqeuce retigcionon. Saberi's work sugsegts we may have some pofrweul palrlael prsooscers at work. The resaon for this is suerly that idnetiyfing coentnt by paarllel prseocsing speeds up regnicoiton. We only need the first and last two letetrs to spot chganes in meniang."

Did you understand anything of it? You probably did (unless you don't speak English). What struck me, again, is the incomprehensibility of my own brain. Even with a background in cognitive science I'm often amazed by the way a brain can fill in missing information and still come up with the right meaning. The blind spot in your eyes is a well-known example, but the piece of text above is another fine one. Writing your Weblogs in the above language will certainly make it hard for people to find your Weblog using existing search engines, unless their spelling is as bad as the text.

The interesting aspect of the example is that if you look at natural languages as programming languages you could say that humans are really bad coders, syntactically speaking. Their code would be bloated with spelling errors. Besides that they would make all kinds of gross assumptions about the compiler. On the other hand you could say that humans are excellent compilers. Even with corrupt input like the text above humans are able to figure out, partly based on environmental knowledge, what the coder meant to say. So semantically speaking you could say humans are excellent coders in the programming language called 'natural' ('supernatural communication' would be natural++).

The difference between programming languages and natural languages is obvious here; programming languages don't allow for noise, while noise is what makes human communication possible (and interesting, ask any Hendrix fan). Since the receiver allows the sender to be sloppy the chances of a small error causing the communication to break down is brought down. Programming languages don't allow for sloppiness, every statement has to be crystal clear. That's a huge difference.

Returning to Paul Graham's metaphor of programming languages as evolutionary trees (including dead branches), I must say that the above made me once again realize that programming languages are human artifacts. Unfortunately a human trait is to clean things up (for most humans at least), meaning that I doubt whether the human driven 'evolution' in programming languages will lead to a "forgiving" language that enables programmers to build complex things (like real trees, e.g. an oak, for the Java aficionados) using sloppy statements. As of now you could say that most languages only allow programmers to build trees with perpendicular branches and square roots that make no chance of surviving outside the atrium of a NY design museum. Real trees can be beautiful or ugly, but they're always easy to grow and difficult to engineer (ask any Bonsai gardener).

Despite the growing consciousness that programming is more like gardening than like engineering (read for example the interview with Andy Hunt and Dave Thomas), I'm still in doubt whether programming languages are 'organic' enough to grow a garden. That would require a 'natural programming language' that allows for sloppiness and noisy communication. That's the only tool a human could ever grow a mighty oak with: by simply, but carefully, placing a seed.

I guess I'm in favor of Ken Arnold's radical notion that programmers are humans, and that a programming language is the interface between the human and the offered functionality (the API). The consequence of that observation is that we need to make programming languages as much a natural language as possible, since humans are natural built coders/compilers for natural languages. The question then is whether humans themselves can construct such a language, since a defining characteristic of natural languages is that they evolve instead of being engineered. Evolution of a programming language is only possible if individuals are allowed to tweak it as they please. Do programming languages allow for that?

One of the primary issues with programming language developments is that we keep starting over. Languages have explicit limitations in expressability or useability that drive people to creating new ones when they become too frustrated. Opensource projects for language tools many help to keep this from happening more. Each time we start over with a new language, everyone is compeled to rewrite all their favorite libraries in this new language.

I think that the real issue is getting people to think of their existing programming language as a toolset, and not a language. I love the expressability of Java. It is simple enough in side effects that I don't have to worry about the silly things that C++ plagues programmers with.

We need to work on developing new layers of expression on top of the layers that exist so that we can work towards natural languages. We've done this to get from machine code to assembler, and from assembler to procedural languages such as C and C++. Java, Lisp, Python, Perl, Smalltalk and others have provided another layer to create their own virtual machine languages of sorts.

Now we need to think about the next level that designs programming language built expressions to solve the harder problems. We all know about modular and object oriented programming. Designing solutions using these tools, instead of writing new expression facades in the form of new languages is, in my opinion, the best choice...

I will suspect the idea of making programming language like human language. Of course it probably make the code easier to read, but it also make the code have more un-structure, fuzzy and hard to determine the real propose. Which, may turn out higher to maintain and understand than now.

Actually, I will think that programming language is somehow better than nature language, just we familiar with the later one is not necessary mean it is good and make our code better.

> I'm still in doubt whether programming languages> are 'organic' enough to grow a garden.

The evolution of a medium to large code-base can be very organic though... whilst many of us would stare at such a thing and immediately start refactoring for all we're worth, there is the real world case that many code-bases that have been generated as a result of many individuals or organisations bear more resemblance to the garden you suggest than any engineered structure.

There is a lot to be said against such a natural language though... the most prominent point would be how to infer what was intended... many a person will confuse 'their' with 'there' and vice versa, and within a programming language such slips could confuse to the degree where the flow of the program is changed.

The point is that even with the garbled text presented, you still conceded that there were rules and patterns to the text that dictates what was intended... but my point would be that there are things I read that are flawed to the point whereby the actual meaning is difficult to infer... then even I, as a human, have to re-parse the paragraph a couple of times to understand what was intended.

Trying to create a useful programming language that is close to natural language is just not practical right now. At the most basic level, programming is the creation of a specification with zero ambiguities for how to perform a task. The requirement of no ambiguities is why the source code from the most useful programming languages resemble a mathematical proof more than a specification written in natural language. To move towards programming in natural language would require a technology that could recognize the ambiguities in a natural language statement, and then explain the problem in a understandable way to the programmer. Clearly, it is going to be a long time before we have a practical technology like this available to the masses.

However, the picture is not that bleak. In my opinion the best hope for improving programmer productivity is through domain specific languages. Obviously not all problems require a new language. New APIs are often sufficient. However, problems in many areas would benefit from a compact and powerful notation. Of course, many such languages already exist, but paying closer attention to places where a new language would increase productivity could be beneficial.

Tree parsing and transformations is one area that I really thinks needs a better language. We have technologies like XSLT and the tree parsing/transformation support in ANTRL (formerly called SORCERER), but neither of these technologies seem very polished. XSLT suffers from many of the readability problems of LISP. Different things should really look different to take advantage of the brain's amazing pattern matching abilities, and avoid spending all our time ?parsing? instead of ?reading?. While the tree parsing abilities of ANTLR are quite useful for constructing compilers, they are too cumbersome for the data transformation tasks that XSLT was designed for.

I agree with whom wrote that it is unnecessary to create a new programming language. Actually I think it is necessary to build a new computer. Maybe a Von Neuman machine is not able to process as our brain does, is it able? Could you see any fundamental difference between current computers and 80's?. Well, current are faster and have much more storage and memory space, but they are as stupid as old ones.

I'm not proposing to build new hardware because we can emulate it in current computers while we can improve the new design, and we can do it using Java or any other procedural language.

The sample text with randomized letters is fascinating. It's still quite readable. There must be a lot of shared knowledge about word spellings and word ocurrences in sentences for this to work.

But it's interesting to note that the sample text had only micro-randomising. If word, or sentence, or paragraph order were randomized the text would become completely incomprehensible. So this example might be more of the exception that proves the rule.

i want to point out another amazing ability of the human brain: taking random facts to draw conclusions unrelated to these facts.

the natural language example is particularly bad: it proves that we recognize words, not letters (ok.) and that we don't process words one letter at a time, left to right. it's been long known that we process chunks of information rather than small information units, and we do it in parallel rather than sequential.

this is interesting, and valid. but it doesn't mean humans are great "compilers". it also has not much to do with being able to recognize patterns in noise (which we, in fact, do well, but it's unrelated).

i think we should keep in mind that sometimes we use computers because we don't want fuzzy. you don't want banking software that keeps your account "somewhat" accurate, give and take. you don't want banking software that sometimes makes errors.

to point out the underlying conflict here: sometimes we don't want to do better than human. and in this case, we don't want to emulate humans. what you do while programming is define precise mathematical logic (well - ideally). and create constructs of logic - architectures - which won't break down.

where we do want to emulate humans - all fields of artificial intelligence - we obviously need radically different approaches. the von-neumann machine isn't going to do it, whether you use Java, Lisp, or an ideal Natural Language...

they are two different things. stop pretending the brain is just a big parallel computer. there is no evidence for this. did you know that when mechanics became really popular in the 1800's, people explained the brain using mechanics? they thought it was just lots of small and complicated mechanics. today, it's computers...