Internationalisation of Programming Languages

This research project explore the topic of programming languages internationalisation. That is Internationalization and localization of programming languages themselves, in contrast with how this topic is covered within this kind of languages to generate localised end user human interfaces.

The goals of this research include:

explaining issues related to this topic

gathering a state of the art on the topic

exposing prominent difficulties that this field include, both on a social and technical point of view

exploring existing theoretic solutions to the previous problems, and possibly make original proposals

Although not in the core topic of this research, programming languages using a single-language lexicon might be explored if they offer compatibility facilities with programming languages based on a different lexicon.

This research does not aim at covering any extra topic, particularly it won't cover the broader topic of compilers and programming languages outside of internationalisation and localisation.

So whenever possible, meditation of the impact of our choices regarding the linguistic diversity and multilingualism on Internet matters. Programming languages are an important part of the infrastructure of the Internet, and of the digital revolution as a whole. In the light of this issues, allowing people to code software using whichever lexical inventory they prefer should be an important concern in every programming languages. On a side note, it's clear that protocols are also an important part of the digital infrastructure. And some of them do include speaking language mnemonics, like HELO in SMTP. Other protocol relies more or less extensively on a digital nomenclature, like HTTP, which is a far more internationalization-friendly approach. However the the topic of protocols is mostly out of the scope of this research, and won't be further developed hereafter.

Although most programming languages will only require a few tens mandatory keywords, most of the time it's the whole ecosystem built around this small kernel which is reinforcing the positive feedback loop of using an unique spoken language as lexical source in the abstracted interface that constitute a program source code.

Of course, they are many cases where using a language largely spread at international level comes with many benefits. This research project don't deny this positive aspects of an international common working language. This includes a wider usefulness of code through usability in many more situations through minor adaptations to locales constraints, as well as larger reviewing potentials for libre software projects.

Thus said, this advantages are not always so useful. Many users might face the need to make little programmatic tasks for very specific cases, in the form of very short code. Or some projects are heavily linked to locale contexts, like a system pertaining to some locale laws. Other cases, such as educational purposes, could illustrate situations where using an non-native language might be an unnecessary overload on the programming task.

On the other hand proliferation of completely unrelated in-house languages or incompatible derivative localized programming languages combine drawbacks of both situations. Thus the idea of internationalisation and localisation of programming languages.

This section list past and existing technologies relative to the research topic. Each section will make a review as comprehensive as possible for each language documented. No restriction is set for languages covered. Priority will be set to on most popular languages and languages including the most advanced features on the topic. Any contribution for coverage any programming language is welcome. Subsection are alphabetically ordered[note 1].

The research will notably looking at the top 20 of the TIOBE index and the most popular languages on open platforms of source code sharing such as Github[1].

Assembly is generally a simple map between a set of words and a list of numbered operators, although some assembly will provide "macro" keywords performing typical sequence of operator in once command. So for this kind of language, a translexicalisation to any language is rather straight forward as it is the modus operandi of such a programming language anyway.

Blockly is a programming language agnostic graphical programming environment similar to Scratch. For its purpose of being a first-steps teaching environment, it is being translated into many languages (including Klingon).

Perl 6 comes with native facilities for creating grammar which will be consecutively fully integrated in the interpretation stack[2]. This is called a slang. It can be something as simple as a relexicalisation, like Mosdef which enable to use def rather than sub as keyword to introduce a new function. But it is flexible enough to parse a whole programming language such as Perl 5, which is based on completely unrelated technology stack despite the name, in the v5 slang. Actually, even Perl 6 is parsed and executed using a Perl 6-style grammar[3].