Wednesday, February 27, 2008

I never liked the comic Garfield. But under this guy's interpretation here, I find it brilliant! I haven't laughed out loud to a comic in years. These versions swing from hilarious, to sad and poignant, then back to hilarious.

Who would have guessed that when you remove Garfield from the Garfield comic strips, the result is an even better comic about schizophrenia, bipolor disorder, and the empty desperation of modern life?

Friends, meet Jon Arbuckle. Let’s laugh and learn with him on a journey deep into the tortured mind of an isolated young everyman as he fights a losing battle against lonliness and methamphetamine addiction in a quiet American suburb.

Monday, February 25, 2008

I never took grammatical gender seriously when I studied German.I just made everything feminine ‘cause, ya know, that was the easy one.The rest of my German was so bad, I figured it didn’t really matter anyway, right?(I frikkin LOVED studying Mandarin Chinese because, ya know, who needs morphology?)

…native French speakers don't agree on the genders of French nouns. They really don't agree. Fifty-six native French speakers, asked to assign the gender of 93 masculine words, uniformly agreed on only 17 of them. Asked to assign the gender of 50 feminine words, they uniformly agreed only 1 of them. Some of the words had been anecdotally identified as tricky cases, but others were plain old common nouns.

[clip]

… second language speakers of French, take heart! Make your grammatical gender agreement mistakes with confidence. There's a chance that your native-speaker interlocutor will agree with your version!

Danke, Heidi!Viel Danke!

Pssst, I should note that David Zubin has done a variety of cognitive linguistic studies on German gender.Most recently, this one:

Friday, February 15, 2008

I've only just now discovered the entirely online corpus search utility Sketch Engine by Adam Kilgarriff, Pavel Rychlý, and Jan Pomikálek. It can replicate a lot of what I do with tgrep2 and Python scripts, but a lot faster (I mean, A LOT faster).

It has the advantages of being fast, easy to use, covering corpora from multiple languages (plus allowing you to add new corpora) and providing user friendly output.

One disadvantage is the brevity of the sketches it provides. For example, I performed a sketch of the verb "prevent" in the BNC and it returned a list of subjects and objects that occur with the verb. Sweet! This is really important stuff if you're interested in FrameNet type semantic description (see my related post here). Unfortunately, it maxed out at 100 (that's a small sample of the 10,000+ examples).

Nonetheless, this utility goes a long way to providing the sort of user-friendly (yet still sophisticated) online corpus query tools that I think the average non-computationally minded linguist would benefit from greatly.

I've used Mark Davies' BNC interface a lot too and that's also an excellent, entirely online search tool. Davies provides a nice interface to a variety of corpora here.

Thursday, February 14, 2008

I just discovered a blog by a student of the language Tigrinya Qeyḥ bāḥrī.

From his site,

Being from a small city in Canada (Halifax, Nova Scotia) I found it very difficult to learn the mother tongue of my parents, as there are few resources availible from which I can learn. So, I decided to create a resource for myself, somewhere I could collect everything I know about the language and use it at my leisure. I thought about using my limited knowledge on HTML to create a webpage, that way I could have easy access to my work wherever I go.

Monday, February 11, 2008

One of the most challenging tasks a linguist can engage in is that of annotating natural language text for semantics. It is simultaneously interesting, tedious and tricky, which makes it altogether maddening. We perform this task for a variety of reasons. Sometimes to create training data for learning algorithms (which was a big topic of discussion at last year's NAACL HLT) or to explicate the semantics of events like the FrameNet project. Part of my dissertation is very FrameNet-like, so I do a lot of annotating (I will save my bile-filled hateful remarks about the general crappiness of annotator apps for another post).

Generally speaking, the annotator's task is to read naturally occurring sentences, then identify and tag the semantic roles of the participants involved in the particular event represented by the sentence. It would be easy if all of English was composed of sentences like "Bobby kicked the ball"; that would be sweet. "Bobby" is an AGENT, "the ball" is a PATIENT. Done. Let's move on. But that's not how real language works, is it?

In any case, I have been annotating sentences involving the verb "exclude" recently and I find it's a particularly challenging set. The BNC “exclude” sentence below was difficult to annotate because the exclude event is not clear about its participants:

The new Minister for Health, Dr Noel Browne, a dedicated reformer of the health services and much concerned in-particular with the eradication of tuberculosis in Ireland, modified the earlier bill to exclude the compulsion elements.

At first, I thought “Dr Noel Browne” was the agent doing the excluding, but then I realized it was the bill which excluded.But which bill?I concluded that “the earlier bill” is NOT participating in the exclude event because, logically, it must be the version of the bill that came AFTER the early one which did the excluding.So, this requires a presupposed later bill.So, should I annotate the good Dr. as the agent, or leave this participant alone (FrameNet's annotator app has the ability to mark an unexpressed element, and I believe this is exactly why, but I don't use their app).Also, it’s not clear if the “to” means “in order to” as a purpose statement.Is the bill explicitly, directly excluding, or was that simply the intent of the changes?If it’s indirect, that makes Dr. Noel a better candidate for the agent of exclusion.

Friday, February 8, 2008

In discussing the recent Japanese phenomenon of cell phone novels, a reader of Andrew Sullivan’s blogtries to explain why the Japanese language is well suited to this style:

The use of Chinese characters also serves to compact sentences. Since you don't have to actually spell out entire words, as in English, but can represent them with an ideogram, you can say a lot more in a much smaller space.

I will provisionally accept that kanjiand kanamake typing out written Japanese on a cell phone more efficient than typing out English (in the sense of requiring fewer key strokes; I'd have to test to see if this is really true), but I reject the logical fallacy that this mechanical efficiency leads to greater meaning.

This strikes me as a variation of a phenomenon Ben Zimmer over at Language Log has written aboutregarding the all too often misrepresented meaning of the Chinese word for ‘crisis’ wēijī .Underlying both of these is the naïve belief that logogramsare inherently more meaningful than alphabetic words.This belief, I reject.

I could be wrong about this, but my hunch is that the human language system takes all written representations of language and converts them into an internal mental representation it’s happy with.There may be differences between the way the brain accesses the meaning of kanji and the way the brain access the meaning of alphabetic words (in terms of recognition), but I don’t see any reason to believe that the internal semantic representation of kanji is somehow different than the representation of words.If I’m wrong and there is a difference, this would be an interesting piece of data for the Sapir-Whorf folks.

Favorite Posts

"Laymen are generally lousy linguists: they do not know what questions to ask, they do not know how to look for answers to them and they are too ready to accept generalizations to which they could easily find counter examples."---James D. McCawley

"Asking a linguist how many languages they speak is like asking a doctor how many diseases they have."---Lynne Murphy (aka lynneguist)