Everything to do with phonetics. Please note: comments not signed with your genuine name may be removed.

Wednesday, 24 March 2010

digraphia

Serbian is unusual among languages in that it can be written in either of two different alphabets, Cyrillic (like Russian) or Latin (like English). As Wikipedia puts it, “Serbian is a rare and excellent example of synchronic digraphia, a situation where all literate members of a society have two interchangeable writing systems available to them.” Compare the related languages Bulgarian, which is written only in Cyrillic, and Slovenian, which is written only in Latin letters.On the streets of Belgrade some advertisements or names of businesses are in one alphabet, some in the other. The same shop window may display written messages in both. Many road signs show names of destinations in both, first in one and then in the other, thus for example Београд Beograd.The letter-to-letter correspondence between the two Serbian alphabets is mostly one-to-one: to each Cyrillic letter there corresponds a Latin letter (perhaps bearing a diacritic), and vice versa. But not entirely so, because there are certain Cyrillic single letters that correspond to Latin digraphs (two-letter sequences). For example, Cyrillic њ corresponds to Latin nj and Cyrillic џ to Latin dž. (Phonetically, these stand for [ɲ] and [dʒ] respectively.)

Another notable characteristic of the Serbian writing system is that the spelling is very close to being perfectly phonemic (at least at the segmental level — though stress and possible vowel length are not indicated). No child has to learn how to spell individual words: you just write them the way you say them, and say them the way you write them. (This is of course notoriously not the case in English.)

21 comments:

In the streets of Niš, if you don't know both alphabets, you're illiterate.

The "digraphs" in the Latin alphabet are treated as single letters even in Croatia (though not in Slovenia). They go into a single square in crosswords or in vertical writing and have official, fixed places in the alphabet (unlike German ä ö ü ß). Of course this doesn't eliminate all ambiguity in practice – consonant clusters like /nj/ do occur in the language.

I've encountered handwritten all-caps in Zadar (Croatian coast) where nj was N with a sort of subscript lowercase j wrapped around its lower right corner.

stress and possible vowel length are not indicated

The length is considered part of the pitch-accent system, which is not normally written.

The one concession to morphology is that the adjective suffix -sk- does not fuse with a preceding -t into -ck-, even though it does in Polish, Czech, Slovak. Thus, Croatia is Hrvatska. A few similar (but rare) cases exist with џ vs дж. Otherwise, morphology is ignored (ruski = rus- + -ski).

[ɲ] and [dʒ]

I haven't encountered [ɲ] or [ʎ] so far, only [nʲ] and [lʲ]. I wonder if using the symbols for palatal sounds is just a tradition of IPA transcription to avoid having to use diacritics. But I suppose dialectal variation could be involved.

Whether š ž č dž ш ж ч џ are retroflex varies between dialects; the same holds for whether ć đ ћ ђ are alveolopalatal. Stereotypical Bosnians (regardless of religion) are said to merge the two series entirely.

''Serbian is unusual among languages in that it can be written in either of two different alphabets, Cyrillic (like Russian) or Latin (like English).''

Hindi-Urdu is arguably in a similar situation -- although religious/nationalist extremists on both sides would disagree vehemently!

Are there any Serbian/Croatian (or Orthodox/Roman Catholic) extremists who likewise try to deny that Serbian and Croatian are essentially the same language? Given the recent wars I would think it likely.

I'd argue that Irish also exhibits features of digraphia. Whilst the traditional Gaelic script is clearly Latin-derived, it does use overdots instead of aitches, which makes it look like more than just a mediaevalizing font. Of course this then opens up a can of worms about German before they abandoned Fraktur.

And on a slight tangent, I continue to be impressed by the efficiency of Cyrillic scripts (okay, there are language-specific glyphs). I wonder if anyone's ever seriously proposed a Cyrillic script for English (in as much as spelling reform is ever taken seriously in English).

Isn't the "efficiency of Cyrillic scripts" largely confined to their use in Slavic languages? Cyrillic used to write, say, Mongolian doesn't look particularly efficient, at least from a quick look at its Wikipedia page. And, of course, Cyrillic was invented to write a Slavic language originally.

If one looked at the Roman alphabet as used in Romance languages only, it would also look pretty efficient (especially if one ignored French!)

Isn't Yiddish another example, in that it certainly CAN be written in two alphabets - though I don't know whether it is a true "synchronic digraphia", i.e. with native speakers equally comfortable using both alphabets. Does Hindi/Urdu meet that criterion?

Are there any Serbian/Croatian (or Orthodox/Roman Catholic) extremists who likewise try to deny that Serbian and Croatian are essentially the same language? Given the recent wars I would think it likely.

Have a look at Language and identity in the Balkans : Serbo-Croatian and its disintegration by Robert D. Greenberg (Oxford University Press, 2004). It has been a while since I read it (I don't own a copy), but I recall that in the first chapter the author provides some telling anecdotes. In Belgrade he was complimented on his "excellent Serbian" and in Zagreb on his "excellent Croatian," but when he used the "wrong" word for the name of a month on the calendar -- it was either the "Serbian" word in Croatia or the "Croatian" word in Serbian -- one of his academic colleagues was so outraged that he took the author aside and berated him for the use of the "foreign" expression. There are even those who distinguish the "Bosnian" language from Serbian and Croatian.

There was a similar politically motivated split between the Macedonian and Bulgarian languages many decades ago.

I'm quite used to seeing all kinds of products with separate descriptions marked as Croatian, Serbian, Bosnian and even Montenegrin (!). I've just checked in my bathroom, and the toothpaste has SRB and BIH/MNE (that's Bosnia and Herzegovina and Montenegro), no Croatian, while the washing powder has CRO/BIH, MNE and SRB texts. (Note the different groupings.) All very similar, but not identical. And it's not any legal or manufacturer data -- you could imagine that you would need separate texts for that. It's product descriptions and instructions. So much for sensitivities ;)

Funny how sometimes you find -- conversely -- products with descriptions marked as SCA or similar for Scandinavian...

Previous correspondents have suggested Irish, Yiddish and (possibly) Hindi/Urdu as languages exhibiting digraphia. Let me add to this list Malay in Brunei, which is written in both the Latin script and the Arabic-based Jawi script.

I don't know if this link will work, but if it does, it shows a road-sign with the same message in both scripts:

@vp Maybe I am just reflecting a cultural bias as to what an efficient alphabet looks like: as you note, most of the Romance languages have fewer sounds than the Slavonic, Germanic, and (to a lesser extent) Celtic ones. And of course none of the Cyrillic alphabets has exactly the right glyphs for writing English: they're basically short on vowels, although some implementations can cause consonantal hilarity (the standard Russian transcription of Sir Edward Heath's surname was re-transliteratable as "Git"). But I suppose what my culturally-biased observation really means is that Cyrillic is a very low-investment alphabet (for those accustomed to Latin and/or Greek) to learn for the benefit of doing away with a load of digraphs.

Of course, doing away with digraphs only matters when the sound of the digraph is different from what would happen if the letters just happened to end up together. I don't know if this happens in Serbo-Croat, but it certainly is an issue in other languages that use digraphs. (For instance, Welsh "ng" [ŋ] is a separate letter that sorts between g and h: so "tangnefedd" (peace) appears in dictionaries somewhere between "tagu" (to throttle) and "taid" (grandfather); but "tangloddio" (to undermine) is just an "n" and a "g" next to each other, the cluster is pronounced as [ŋg], and sorts somewhere between "tanforol" (undersea) and "tanio" (to set on fire). Of course, I'd be very interested to see examples of this in other languages.)

Having letters whose Title Case is different from their UPPER CASE is a headache for programmers (if they worry about it; if they don't it's a headache for readers).By "Title Case" you meant small-caps, right?I'd argue that Irish also exhibits features of digraphia. Whilst the traditional Gaelic script is clearly Latin-derived, it does use overdots instead of aitches, which makes it look like more than just a mediaevalizing font. Of course this then opens up a can of worms about German before they abandoned Fraktur.You want to draw a line somewhere, or I could claim that English has digraphia because it can be written with both serif and sans-serif fonts. And I have seen Gaelic typefaces -- though in the variant with "short" glyphs for r and s -- used in languages other than Irish (e.g. English) for decorative purposes e.g. in advertisement.

@wjarek: I've seen ingredient lists with both Czech and Slovak, while the Slovak restaurant in Prague (sic) I once was in had menus which some words in parentheses, and my wild guess is that it was in Slovak with translations in Czech in parentheses for significantly different words.

As an example, the lower-case form of the letter "dž" is "dž"; the upper-case form of the letter is "DŽ", and the title-case form is "Dž".

Obviously, title case differs from upper case only in "letters" that are digraphs (or polygraphs - composed of more than one written glyph, at any rate).

Not every language that uses digraphs does this, though; for example, I'm told that in Dutch, the "ij" digraph has no title-case form -- so placenames such as IJmuiden start with two capital glyphs (i.e. with a capital letter ij, rather than with a title-case letter ij).

When Yugoslavia was one country, it had an official language, Serbo-Croat (alongside Slovene and Macedonian). Now Serbo-Croat has been divided into Serbian, Croatian, Bosnian and Montenegrin. Presumably the spelling system of Serbian was not so convenient for the people of Bosnia and Montenegro before they declared their linguistic independence.

The funny thing about English is that its spelling system doesn't correspond to any one dialect: it's a mish-mash.

When speaking about standardizations of the Balkan Language That Has No Neutral Name, I think the best compromise between brevity and accuracy is to say that in the 19th century there were two standards: Standard Serbian, based on the Ekavian sub-dialect of the Shtokavian dialect, more accepting of loanwords, and written in either Latin or Cyrillic; and Standard Croatian, based on the Ijekavian sub-dialect of the Neo-Shtokavian dialect, but with lexical contributions from the Chakavian and Kajkavian dialects, more purist, and written in Latin exclusively. (In the 17th century there had been an earlier Standard Croatian based on the Ikavian subdialect of Kajkavian, now lost.)

Standard Serbo-Croatian was established in 1954. It was a dual standard that recognized both Standard Serbian and Standard Croatian as equally acceptable. After the breakup of Yugoslavia, Croatia reverted to accepting only Standard Croatian as its standard language, and the outher countries to accepting only Standard Serbian. Since 1991, two more standards have arisen, Standard Bosnian and (nascently) Standard Macedonian.

All this is only loosely connected with the spoken dialects and how they are distributed across the various nation-states; if national boundaries followed linguistic ones, there would be no national boundaries, as in most parts of the Balkans.

I have just been in Novi Sad for a week, and the Cyrillic or Latin characters seem to be used randomly. I have seen street name signs that are written with latin characters at one corner and cyrillic at the other. I wonder how the children learn to read at school. In bookshops I have seen a mix of Cyrillic and Latin written books...

@janjokerIt's very easy to read. 30 letter=30 sounds. We first learn Cyrillic (at age 7), then Latin (8).The problem is when (Serbian) kids start learning English. You hear all the time -Why is English sooooooo unfonetical?-