A couple of months ago, I received a phonecall from a journalist from the Herald, who’d seen my appearance on SBS World News, and was interested in writing an article about the mobile phone dictionary project.

A few things have happened between then and now, including conferences, holidays and a didjeridu performance by Nicole Kidman on German TV that seems to have absorbed all local interest in indigenous affairs for a few days1, but on Friday morning, two articles appeared in the front page section of the Herald, based in part on an interview I gave a little while back.

The main article is about Phil Parker, the marketing guru who’s recently delisted his ‘books’ on Australian languages (including dictionaries, thesauruses and crossword puzzle books) after his dubious publications hit the virtual shelves, and after a small but vociferous group of linguists complained. The other article is about this mobile phone dictionary project that James and I are getting more and more involved in, and (very quickly) how this sort of project can prevent the theft of data in the first place.

I feel that the article on Philip Parker makes me look like a bit of a whinger. Here’s the operative quote:

Aidan Wilson, a Sydney University linguist who wrote an honours thesis on the Wagiman language spoken north-west of Katherine, said Professor Parker had used the wrong spelling on the cover of his publication Webster’s English To Wageman Crossword Puzzles: Level 1.

Yes; it’s true that Parker had the wrong spelling, but it’s clearly not the reason I’m annoyed at the publication of these books. I’m more annoyed that the entirety of information within them is publicly available at locations that properly explain the data, the language, and cite sources, while these dictionaries, thesauruses and crossword puzzle books omit all of this information. In short, they are lossy2 versions of dictionaries already freely available.

The article also makes it sound like we, speakers of indigenous communities and linguists working with them, have hindered the publication of useful educational resources due to our collective sensitivities. It doesn’t help the situation that Parker probably had his heart in the right place in wanting to further disseminate information relating to critically endangered languages.

A dyslexic, he collects lists of words and publishes dictionaries, thesauruses and crossword puzzles at a loss, he says, in the interests of education. His work has been heralded as a way to create paper resources for resource-starved Third World students.

That’s all well and good, but perfectly good materials already exist – those that the linguists have produced and made freely available in full consultation with the language community. It surely isn’t helpful to convert these into forms in which the information is distilled and compressed such that it no longer conforms to even the minimum standard required for the most basic dictionary. All information apart from the name of the language, the headword and a single gloss has been omitted. That truly is lossy. To give you an idea of what I mean, here’s an entry from the Online Wagiman Dictionary:

You can see that there are no less than 6 tiers of information here; a headword, part of speech, glosses divided into multiple senses, illustrative sentences, their glosses and importantly, the speaker responsible for that illustrative sentence, as well as related words. Parkers dictionary merely has this:

ngal-gawu-mang
grandmother
grandchild

I don’t think anyone could reasonably argue that the latter is more useful than the former, or even that it is good for it to be around in addition to the original. I would even go as far to say that its existence in this form is potentially harmful and outweighs any possible benefits of it as an educational resource.

There is another issue that stems from this that deserves attention. Suppose you found one of these dictionaries for a language you’ve never heard of. Let’s say it has some pretty extraordinary stuff in it and you’d like to know more, or even go to the sources and do some fact checking. How do you go about doing it? There’s no citations given anywhere, no examples have made it through the distillation process and no speakers are referenced. We’re in a different situation as we know the original is a good quality publication due to Stephen Wilson’s work, and can pretty much trust that the ‘distilled’ version will more or less be correct. But if Parker gave the same treatment to a highly dubious dictionary, Urban dictionary, let’s say, then the output looks just as authoritative as something that derived from a reputable source in the first place. This clearly makes it very difficult for readers of dictionaries to make informed decisions about the quality of what they’ve got.

I should reiterate that I think Parker had the best of intentions; to further disseminate information about as many languages as possible, something I naturally admire as a linguist. Yet he fails to recognise that lexicography is not easy work; it can’t be done just with a data-harvester, a spreadsheet and a bunch of automatically generated Amazon.com comments and reviews. It takes linguists and lexicographers years to compile the information and resources necessary to create dictionaries. Producing very low-quality dictionaries, thesauruses and crossword puzzle books of some 600 worldwide languages does nothing but undermine their efforts.

As I promised last week, I’ve managed to find a copy of the SBS World News report in which I appeared, that mentions and demonstrates the mobile phone dictionary – thanks to Jeremy who recorded it – and so I’ve put it up here.

Just bear in mind that I had no idea that I was going to be interviewed, which is why I’m unshaven and wearing – ahem – a Transformers T-shirt (Decepticons, no less).

I suppose this destroys for good any semblance of internet anonymity that I had feigned.

<UPDATE>
As Michael noticed, I think the large video file was causing some strife for the company that generously hosts this site, Affernet, so I’ve YouTubed it instead.
</UPDATE>

I occasionally find myself amused to see in my blog stats that someone has translated my blog into another language. Being so inquisitive, I often follow their lead.

Yesterday morning, I noticed that one of the referring pages was a Google translation of this post into Korean. Naturally, I had a look to see what my blog would look like written in Hangul. As you might expect, it looks really cool, except that I kept noticing a telephone number, the same telephone number, all the way through. Here’s what it looks like:

Strangely, each and every time this telephone number appears, it is preceded by the characters 전화, which, according to a Korean-reading friend of mine, means phone, and the whole thing is immediately preceded by Rudd. Looking at the corresponding English of each line (it pops up when you scroll over a line of Hangul), it appears that the phone number is purely being inserted and has no corresponding constituent in the English.

To put this another way, the string of letters Rudd in English, becomes Rudd 전화 +852 2907 2112 in Hangul.

In an attempt to track this a little further to its source, I typed “Rudd” into Google’s translation page, and sure enough, the phone number emerges. This tells me that it’s an artefact of Google translator, and not some mysterious subliminal message that I’ve subconsciously coded into my blog for the sole benefit of Korean readers.

I’m a little discombobulated1 by this, so if you know anything more about this oddity, or could even posit an explanation, I’d love to hear it.

Someone might even like to put their neck on the line and ring the number…

I’ve always wanted to use that word.

Banengh-nga?

Matjjin is a Wagiman nominal root meaning language, word or story and nehen is the privative case suffix, 'without'.