Singular Value Consulting

Russian novel programming

One of the things that makes Russian novels hard to read, at least for Americans, is that characters have multiple names. For example, in The Brothers Karamazov, Alexei Fyodorovich Karamazov is also called Alyosha, Alyoshka, Alyoshenka, Alyoshechka, Alexeichik, Lyosha, and Lyoshenka.

Russian novel programming is the anti-pattern of one thing having many names. For a given program, you may have a location in version control, a location on your hard drive, a project name, a name for the program executable, etc. Each of these may contain slight differences in the same name. Or major differences. For historical reasons, the code for foo.exe is in a project named ‘bar’, under a path named …

I thought about this today when looking into a question about a program. A single number had different names in several different contexts. There’s the text label on the desktop user interface, the name of the C# variable that captures the user input, the name of the corresponding C++ variable when the user input is passed to the back-end numeric code, and the name of used in the XML file that serializes the variable when it goes between a database and a web server. Of course these should all be coordinated, but there were understandable historical reasons for how things got into this state.

You get that a lot with geometrical code. A few years back I worked with the interface between a CT scanner and a treatment planning system for radiation therapy. CT scanner exported things head first, planning system wanted feet first so this software had to sit in the middle and reverse the order of everything. Similarly when you got on the machine in the planning system it left right was A B, on the machine it was Y1 and Y2. You had to convert coordinate system names 3 times to do anything. At least it wasn’t converting a path defined in polar coordinates to cylindrical or anything but still you’d think the industry could come up with standard names rather than just having to have converters at each stage to translate things.

In software a big source of this is APIs you might find an API for charting say that you want to have x and y have real “names” like age and height but the api takes x and y[].

I would love a file system that realizes I have 6 copies of the same file on my harddrive (Accuse I’m dumb and lazy) and will have the same MP3 under the bands name, in misc from when I dumped it of a dying MP3 player, in a folder I made when I downloaded it from Bandcamp and so on.

I’d love for it to automatically store these in the same physical place, and then make them appear wherever it puts them on the hard drive.

Or am I the only one who goes “I forget where I put that giant file, I’ll just redownload it from my email/CD/portable HD”?

I thought that Russian Novel Style was the anti-pattern of having to create a long prologue before getting to specify any plot. E.g. the joke “How is writing Java like writing classic Russian literature? You have to introduce 100 names before anything can happen.” — @jamesiry

I am Russian and I do not like that you use word «Russian» in a negative context. I thought a little about how I can humble English speakers proportionally and that is what I came into.

ENGLISH-STYLE REPORTING

I have done a lot of software translations and all of them suffered from the effect I suggest to call «English-style reporting». It is the anti-pattern used by English-speaking programmers which prevents software localisation to other languages.

Let say computer program wants to report its state (for example, error) to the user. State is composed of some parts. For example, part A and part B, which can have different values in different combinations. For example, A may be «file» or «memory», and B may be «cannot be read» or «cannot be written».

What English programmers typically do when they want to report this complex state to the user is concatenating strings corresponding to different parts of the state which leads to outputs like «file cannot be read» or «memory cannot be written». Good for now.

Now suppose that such ill-conceived software should be localised to other language, for example, to Russian. In Russian file has masculine grammatical gender (file is a «man»), and memory has feminine gender (she is «woman»). Passive verb «to be read» in Russian has different forms depending whether it is applied to a man or to a woman.

So, the software which uses strings concatenation to report its complex state cannot be easily translated to other languages.

Another example: «1 apple», but «2 apples». Naive English-speaking programmers put into localisation file the strings «apple» (used for 1 apple) and «apples» (used for 2 or more apples). Such software also cannot be localised because other languages do not work this way. For example, in Russian we have 3 forms of word that is going after numeral; the correct form is determined by two last digits.

Last example: programmer in hindsight implemented array with 12 month names: «January», … , «December». Looks good, but in Russian each name can be in 6 grammatical cases depending on other words in the sentence.

Such software cannot be translated by just providing corresponding language file. For each English version we need to make source code modifications to turn it into Russian one. As the result new Russian version appears much later than English one. As an example of such a software you can take WordPress (used in the Web to power blogs). Each time I downloading new Russian version of WordPress I applying patch with few hundreds lines of code just to make it even more Russian.

So, English-style reporting is when programmer encodes most of natural language details directly into the code, and puts into localisation file just words or parts of sentences, making software impossible to localise by modifying language file only.

P.S. Russian version of Microsoft Windows suffers a lot from English-style reporting.
P.P.S. Alyosha, Alyoshka, Alyoshenka, Alyoshechka, Alexeichik, Lyosha, and Lyoshenka are the same words to my ear, just like cat, kitty, pussycat, pussy and so on to you.

Anton: I’d say assigning gender to words is an inherent flaw in the language which should be purged anyway. In other news, I’m an elitist prick who thinks logic trumps history. (Yes, I also consider Chinese a very dumb language since it doesn’t lend itself to typesetting or use in a computer. To be fair that is because no one bothered to simplify the early symbols as was done in other languages as they became more abstract, [My history textbook has some great comparative diagrams of stages of Mesopotamian and Egyptian writing], so you can blame early Chinese traditionalists.)

I suppose that for months you could simple code which month it is into the code (1, 2, 3…) and pass that off to a subroutine to name it. I do wonder how much of the time it takes wordpress to load now is going into dumb grammatical things I’m not going to read anyway…

I knew a guy who claimed to have done a find and replace on names in a Tolstoy book (not War and Peace). He claimed substituting, say, “Fred” for “Alyosha” made the book much easier for him, but almost impossible to discuss with others in his class…

He was always a joker, so I never knew if he really did do it. I DID see his comic book version of Moby Dick, the one that he’d also claimed to leave prominently out on his desk during class to annoy his professor.

@Anton
I also think there shouldn’t be gender specific words and this seems to be more problem of Russian than English.

But I think there is also problem in code.
You shouldn’t just translate A and B, you should translate string like “&1 cannot be &2″. It gives you much more room to correct grammar, change order of words, etc.

i18n is hard and there is nothing to do with it…Wait,wait…There is one thing to do – just do not do it, if you do not know cultural features of other languages. It is conceptual mistake to do i18n if you speak only English, just like it’s using SQL for unclear purposes without understanding it.

Probably, lingual discussions are misplaced here. It is not CS question whether grammatic gender is good lingual feature or not. It’s just there.
Still, I’d like to add, that grammatic gender enables “short-addressing” to recent objects by gender withaout any lack of expression. It shortens bunch of sentences while making them more informative. But it’s just hard to use it correctly even for Russians. 2 years ago coffee in Russian language officially changed gender to middle, before it was striclty male. And anyone who used middle gender were supposed dumb.

@Canajeek @Jussi Although I’m also Russian, I still hate genders for words. This is because I’m learning German, and those pesky Germans also have genders for words. And guess what? Those genders are different from the Russian ones!
And by the way, if we are going to exterminate genders, let’s start with English, where “ship” is “she”. And let’s not stop at that. I think that articles are next for extermination. You’d still understand me even without “a”, “an”, “the”, so why use them? This is an inherent flow in the language which should be purged anyway!

@Canageek, @Jussi
Funny people, blaming the language. So will it be English who is at fault if things couldn’t be translated to English? For example, English fails pretty badly in providing analogies to eastern honorifics. Or Japanese nouns which have neither gender nor plurality — let’s purge this from western languages?

could we say that these functions do the same?
Or myvar vs *myvar is it the same?

How could I call Alex in English with saying that I know him well, I am older than he, my gender, my opinion about him in one word?
So this is a problem of reader that he chose not adapted translation or not well prepared for reading such literature.
I believe that any language have such pattern but before call it in bad way we should try to understand why it is so.

Either you are too sensitive, or you miss the point, perhaps due to language issues. There is no implication here that Russian novels are bad, or that Russians are bad, or that Russians are bad programmers. The point is that the experience that English-speakers have when reading the names Russian novels is challenging, and that we shouldn’t add that sort of challenge to the software that we write.

Perhaps it is hard for you to imaging that Russian names are difficult for Americans, but you would have to have a poor imagination to think that everyone in the world naturally associates Alexander to Sasha.

Most importantly, understand that just because a type of behavior is bad in programming, doesn’t mean that it is bad. I’m hoping that the pasta makers will allow us to refer to spaghetti programming without needing to find a criticism of American cuisine.

@Dmitry, @Alexander
Yes, there is no perfect language.
In English she/he can be substituted with ze, and him/her with hir.

But I think “the” is useful sometimes, as it makes distinction between specific and general. However I’m not sure I always use them correctly… my native language is Finnish, which have pretty horrible features!

You use letters to make words, and in same manner you should be able to use words to make sentences. However if context changes words, there is something wrong, just like every word shouldn’t require own alphabets.
Why to add complexity without adding new information?
Or perhaps more generally, is it worth to sacrifice modularity of words?

John, you’ve just explained why members of the off shore team I worked with kept changing names.

To those who want to make a language argument, note that any argument made, including the original post, is all based on translation, or lack thereof. In the case of the original article it is the failure to translate the sense of formality or familiarity with a single character that is the problem not that the original text made such distinctions.

Although I understand that there is value in literal translations, in the case of the novels it seems that the translator actually decreased the readers’ understanding by doing so, which ultimately proves both John’s and Anton’s underlying points.

The variations on names in Russian novels is not that hard to get used to, and you can debate the artistic merits of different translation strategies. I didn’t mean to imply anything negative about Russian novels. I needed an example of a context where things often have multiple names, and Russian novels came to mind.

I’m traveling as I write this, and just this afternoon I had trouble finding my hotel because the street that it is on has two names: one on the side of the buildings and another on street signs. And they’re not recognizable variations of each other, at least not recognizable to a foreigner like me.

Just a note that the proliferation of names is hardly restricted to Russian novels. Try a Regency romance some time. But of course these are simply reflecting the naming situation in the upper classes of Britain at the time. E.G. Augusta Ada Byron, Ada Byron, Ada King, Baroness King, Countess Lovelace, and Ada Lovelace are the same person.

Jussi: English doesn’t do that; we add things to words to specify tense all the time. Also, there are two schools of thought. English uses mostly static words and uses order to specify the relationship. Latin modifies the words, but doesn’t care about the order, which has its own advantages.

Dmitry: Technically in English ships are items, and thus don’t have gender. However, there is a tradition outside the rules of English grammar to refer to ships as female. Technically calling them she is wrong, but everyone does it. ‘A’ and ‘an’ could certainly be combined, but I’m pretty sure there is a reason to have them. Have you ever had a friend who didn’t bother with grammar? I know someone who drops ‘unneeded’ words when talking online, and doesn’t use capital letters, and his writing is almost impossible to read. All the important words are there, but you have to read it several times to make sense of it.

I see the related problem every day in large system design projects. For n levels of abstraction there is for every m components the unofficial name, which might be different for different groups involved. Then there are the official names in documentation, often varying and unfixed. Then there are names by location, server names by application, by app server technology, by vendor name, by OS name, by hw model and vendor name. Then names by DNS, and by machine id in various system sw. By database instant or scheme name. Add complex virtualisation and load balancing and clustering, and you never know if “sys1″ and “service2″ have a dependency or not and what is it.

@Anssi: spot on!
On top of that there are naming conventions, which usually differ between subsystems (e.g. Java code and database). So, the best one can hope for is consistent naming within a context.
Furthermore, enforcing consistent naming over all subsystems might even hurt. Consider addresses: Within an e-commerce-application you will have at least shipping and invoice address. For the shipping subsystem “address” means “shipping address”, for the invoice substem “address” means “invoice address”. Using the unabridged will make the code less readable, because it will always make you think of the other address.