Programming languages are formal languages, but unlike many formalisms, they also have certain inherent meaning defined by their operational semantics, or, in the case of markup languages, presentation semantics. And programming languages, though indeed formal, are very similar to natural languages, in terms of the communication they allow.

While the development of programming languages is artificial, the natural-language processes of evolution, borrowing, intermixing, and mutation all have fairly clear artificial counterparts. So I'm wondering: has there ever been any large-scale, in-depth research into the evolution and behaviour of programming languages from a linguist's perspective ?

Perl was made by a linguist, and it is messier than the English language itself.
–
JobMar 23 '11 at 15:35

1

@Job: And it's been around for over 23 years, is installed by default on countless Unix-like platforms, and is still used regularly for everything from automation (for which it's way cleaner than shell scripting) to Web development (for which it's way cleaner than PHP) to its original purpose of text processing with regular expressions (for which it has established the de facto industry standard). Sure it's messy, but in a way that works well for people, and that's where in Perl's development Larry's linguistics background was a boon.
–
Jon PurdyMar 23 '11 at 16:36

3 Answers
3

Remember that formal grammars, without which modern programming could not be, are the product of the research of the linguist Noam Chomsky.

A car accident kept me from finishing a graduation thesis on the subject you ask about, so there's no references I can give you, only an opinion.

Spoken languages evolve at any speed depending on the context, and they do so in ways unpredictables as the unpredictable human contexts. The outcome of WW2 had huge effects on the Japanese language. British, Australians, South Africans, and North Americans don't quite speak the same language. The use of verb declinations among what used to be Spanish colonies has become quite different after two hundred years of independence (the ex-colonies think that the Spanish of Spain is archaic).

Just the force of efficiency over phonetics makes the different words used with different frequencies on different regions be pronounced differently: very common words are skimmed, or misspelled, and less common ones are stated as accurately as possible.

Natural languages, with their variations, nuances, and evolution are not apt for the determinism we demand of computers. (Gee! Given the commonality of misinterpretations and second-interpretations, it seems that they're not apt even for the simplest interactions among humans [refraining from quoting jokes about what girl/boy-friend says and what it really means])

In our research (I had a tutor) we looked at Greek and Latin because they had well-defined grammars that covered well every role a word could have in a sentence by their declination. It wasn't good enough, and records of how people actually spoke those languages say that it was very different from what their grammars indicated, as it happens with modern languages.
–
ApalalaMar 20 '11 at 22:39

4

Regarding ancient Greek and Latin - part of the issue here is that the surviving texts tend to be formal in nature - essays, contracts, legal rulings, etc. If you think about the messages we send day to day - "Hi Honey, Please get milk on way home" and "Jim - remember the Casey report for the 9 o'clock" - most of these transient messages in ancient Rome will have been lost forever in time.
–
HorusKolMar 20 '11 at 23:23

2

That's not entirely true. My mother occasionally talks about her high-school Latin classes, and mentioned one piece they translated, a memo from a Roman patrician to his chariot driver. The gist was "Please, during rush hour, DON'T get caught behind so-and-so's chariot. I don't know what he feeds his horses, but the stench is TERRIBLE."
–
John R. StrohmMar 21 '11 at 16:03

For those of us with decades in the field it is obvious that programing languages have interbred, and that thus one finds most aspects of any pure paradigm in most modern programming languages, the now called multiparadigm programming languages: C#, Python, Java, .... Even previously pure functional languages like OCaml and Haskell include enough procedural (through monads) and OO features to let you do anything.

What has happened, I think, is that it became obvious that it was costly (when not silly) to have to switch programming languages just to be able to apply a the right paradigm to a given subproblem.

There remains an exception to the trend in the area of highly parallel and asynchronous systems. There the preferred languages are strictly functional, like Erlang, probably because it is easier to think about such complex systems functionally.

The non-paradigmatic part of the evolution has been on syntax. Languages that encouraged or even allowed cryptic programs have become less and less used (APL, AWK, and even Perl and LISP). The dominating syntaxes today are those of more readable (as opposed to easily writable) languages like C (C++, C#, Java, Objective-C, Scala, Go, IML, CSS, JavaScript, and also Python), Pascal (Fortran 90+x), Smalltalk (Ruby), ML/Miranda (OCaml, Haskell, Erlang), and SGML (HTML, XML).

This diagram is not completely accurate, and it is not up to date, but it gives a good idea of how much programming languages have converged since the language-per-site era of the 1970's.

This is much more like what I was looking for. I guess I'm also looking for direct correlation with morphology and phonology, on top of the obvious association with syntax that comes from working with formal grammars.
–
Jon PurdyMar 24 '11 at 0:53

@Jon Well, the other obvious trend is that English was and is the dominating natural language underlying all programming languages, both syntactically and grammatically. Programming languages are left-to-right, verb-first. Japanese, for instance, is very different, but I know of no efforts to develop a Japanese-style programming language. en.wikipedia.org/wiki/Japanese_language#Sentence_structure
–
ApalalaMar 24 '11 at 13:38

@Aplala: SOV order is common in stack-oriented languages, infix operators count as SVO, and functions (Lisp being the pathological example) are VSO. English definitely has a strong influence, but I think there are other factors at work...I may have to do this research myself. :P
–
Jon PurdyMar 24 '11 at 15:19

@Jon Yes, I forgot about Forth and Postcript, which are both stack-based and SOV. Please let me know if you start the research. I specialized in language theory at the university, and programming languages are still my hobby. I have first-hand recollections of many of them (Simula, Prolog, LISP).
–
ApalalaMar 25 '11 at 2:00

@Jon You may find it interesting that many Spanish-speaking programmers will prefer to use identifiers in Spanish even though they badly match the programming languages, libraries, frameworks, standards, and tools they use. Their programs end up in "Spanglish". I've seen the likes with programmers with native languages in other languages with Roman/Latin heritage, like the East-Europeans. I have no idea about what the Far-East (Chinese, Japanese, Koreans), Russian, or Arab programmers like to do.
–
ApalalaMar 25 '11 at 2:06

I like @Apalala's answers, which appear to show a convergence to a few major general-purpose languages. That only makes sense, since a good idea in one can sooner or later be picked up by the others.

What I would add is that whenever one is using a language, they are necessarily extending it, by adding terms, transforming it into a language more oriented to the domain at hand. Sometimes this is fairly straightforward, sometimes not.
Here's an example that was not so straightforward.

A property I appreciate in a general purpose language is the extent to which it assists in the definition of new domain-specific languages.