I have been reading Richard A. V. Cox’s Geàrr Ghràmar na Gàidhlig (‘Short Grammar of Gaelic’). It’s very dense, very detailed and 492 pages long, not to mention entirely in Gaelic. To this end there is a glossary of the technical vocabulary, which is generally easier to work out than the corresponding vocabulary in English: apocope, syncope and aphaeresis are teasgadh deiridh, teasgadh meadhain and teasgadh toisich.

The immediate family members in Scottish Gaelic are màthair, athair, bràthair, all of which are clearly related to other familiar European languages, and piuthar, “sister”, which looks odd. Irish is yet odder at first glance, with deartháir meaning “brother” and deirfiúr meaning “sister”.

I’ve been reading David Stifter’s Sengoidelc, a readable and reassuring text about Old Irish, the written Irish of the 8th and 9th centuries, which contains at least part of the explanation. It turns out that in Old Irish there were two letters s. One of them lenited by turning into an h, a bit like s in Gaelic becoming sh pronounced /h/, but the other one turned into an f, and the main word that began with that sort of an s was siur, meaning “sister”.

What seems to have happened in Scotland is that the nominative case form was back-derived from the lenited form phiur and assumed to be piur. Conversely in Ireland the nominative form won out, and they say siúr, but mainly, I think, for non-biological sisters, like nurses and nuns. A further difference here: Scotland retains the disyllabic form, whereas in Ireland it’s been simplified to a long vowel.

But why in Ireland do they say deartháir and deirfiúr for your biological siblings? Enter eDIL, the Electronic Dictionary of the Irish Language, which has entries for derbráthair and derbsiur, “true brother” and “true sister” respectively.

“Just eleven months before that”. In my annotation guidelines I have blithely stated “Attributive numbers are N/N“, which is fine for aona, but less so for deug, which I am going to treat as N\N. And yet in trì deug mìle it seems fair enough.

(2) Bha Gàidhlig ga bruidhinn air feadh Alba anns an aona linn deug.

(3) Bha sin ann an naoi ceud deug, fichead ‘s a ceithir.

Years and centuries are interesting. In (2), anns an aona … deug means “in the tenth”, as opposed to the other examples where deug means “ten”. In (3) the heads look like ceud, fichead and ceithir, so each of these can be N too.

Different rules apply, however, for the personal numbers: aonar, dithis, triùir and so on because if they are not standing on their own, they are followed by a noun in the genitive, for example dithis chloinne (“two children”) where dithis is N and chloinne is N\N.

I have partly been quiet here because I have been hard at work putting together something for this:http://www.lattice.cnrs.fr/CLTW/index-en.html
and clearly I should not prejudice the double-blindness of the refereeing too much. Ahem.

I said (four years ago) that Gaelic doesn’t have resumptive pronouns. However, while scouring William Lamb’s Scottish Gaelic for unusual uses of agus, I found these examples, with the resumptive bit in bold.

sin an gille a shuidh Cèit air (that is the boy who Kate sat upon) (do not try this at home)

sin an gille a tha a mhàthair bochd (that is the boy whose mother is ill)

Now, in dictionaries air in the first example is indeed treated as a pronoun, though for subcategorization purposes I prefer to treat it as a PP. The second case, a as possessive pronoun, I’ve been treating as a pronoun, so on my own account what I said about Gaelic was wrong. It may of course be a determiner. The evidence for this off the top of my head is that unlike the small class of prenominal adjectives deagh, droch, sàr and so on, the possessives mo, do, a and so on can’t co-occur with the article an or with gach, and that unlike nouns in the genitive they go before the possessor rather than after the possessor. Pronoun or determiner, they have type N/N in categorial grammar.

Apparently there are resumptive pronouns in Irish, but I don’t have enough Irish to make sense of the literature I’ve seen on the subject, so I shall stop here.

One aspect of Gaelic I want to look at more closely is interrogatives. Just as all the wh- words in English (who, when, why, what, how) go to the front of the sentence, so do all the c- words in Gaelic and the word order in the rest of the sentence changes as well. This is not universal, however. In Chinese, one simply substitutes the word for ‘what’ in the ordinary sentence order, just as when we’re particularly surprised in English we might say “You ate what?”.

In order to see how they work exactly, we need example sentences, so I’ve been looking in DASG. One easy first step is to look at frequencies in this table:

Interrogative

Count

English

Observations

cò

9122

who

noisy; lots of prefixes and parts of words

ciod

4587

what

—

cia

2363

how

also cia mar in older texts, cia fhad ‘how long’, cia mhòr ‘how big’

dè

403

what

also ‘God’

ciamar

273

how

—

càit

182

where

also genitive of cat meaning ‘cat’

carson

133

why

—

càite

90

where

—

cuin

59

when

—

cuine

15

when

—

These are the results of accent-insensitive searches as the older texts haven’t had their spelling modernized or made consistent. The results surprised me a great deal for a number of reasons. Firstly, ciod ‘what’, which I don’t recall seeing terribly often in the present day is the most numerous interrogative, mostly occurring in a single document, a history of Scotland. One of the very first words you learn in Gaelic is its modern counterpart dè, which only has about 200 (judged by eye) instances as an interrogative in DASG. This is a similar number to càit(e), carson,cuin(e), and ciamar, ‘where’, ‘why’, ‘what’ and ‘how’. Secondly, the enormous number of hits for cia ‘how’, which on a cursory inspection are often exclamations, ‘how swift’, ‘how long’, ‘how horrible’ or an old spelling of ciamar in addition to the more familiar cia mheud ‘how many’. Thirdly, nearly all of the instances of dè meaning ‘what’ are from a single work, Saoghal Bana-mharaiche, describing the Gaelic from the coast of Easter Ross.

I’ll leave you with a new meaning I’d never seen before for gu. This can be gu the preposition, gu the subordinator (as in gu bheil), gu the aspect marker or gu the adverbializer, but Gu dè tha thu? from DASG31, Ugam agus bhuam, is clearly neither. As explained here, what is going on is this: the Gaelic for ‘what’ used to be ciod e, like the Irish cad é, and over time this became dè. Gu dè is a variant of this. It’s another one of those pesky multiword expressions.

[Edit 2015-01-03 to clarify reason for looking at interrogatives and add another meaning of gu.]

Welcome back. It contains eight and a half million words and is a resource I keep coming back to. In my first investigation, I’m looking for the second comparative, which I had never seen before last weekend. Here’s an example:

Is feairrde na stamagan srubag dheth

(The stomachs are better for a wee drink in them.) It’s explained in Gillie’s Elements of Scottish Gaelic Grammar, as differing from the normal comparative (“Xer”) in that it means “Xer by that” or “Xer because of that”. If you search for a word, DASG gives you a concordance so you can look at the local context of words.

Some second comparatives in DASG: feairrd, feairrde, misd, bigid, lughaid. An ambiguous word that might be a second comparative: mòid. I look forward to a POS-tagged version of DASG.

A very quick note to say that I’ve trained maltparser, a dependency parser, with the current gdbank sentences (a mere 1223 tokens spread across 70-odd sentences), the Universal POS tagging scheme and the current Universal-ish gdbank dependency annotation scheme, and then seen how it performed on an unseen test set of 8 sentences containing 276 tokens taken from an article in The Scotsman from a few years ago.

It got 196 (71%) of the heads right, 207 (75%) of the dependency types right, and both the head and the dependency were right in 187 (68%) of cases. My initial impressions is that the main problems are subordinators and my having mis-POS-tagged a few words, but there will be a confusion matrix soon.

If you train MaltParser using the learnwo flowchart in place of learn, it does all the same things, except that it writes out the sentences as it reads them in.

This means that if you have, ahem, misformatted any of your input, you can see exactly which misformatting MaltParser is complaining about, because it will be in the first sentence that hasn’t been written to stdout.