A couple of months ago, I received a phonecall from a journalist from the Herald, who’d seen my appearance on SBS World News, and was interested in writing an article about the mobile phone dictionary project.

A few things have happened between then and now, including conferences, holidays and a didjeridu performance by Nicole Kidman on German TV that seems to have absorbed all local interest in indigenous affairs for a few days1, but on Friday morning, two articles appeared in the front page section of the Herald, based in part on an interview I gave a little while back.

The main article is about Phil Parker, the marketing guru who’s recently delisted his ‘books’ on Australian languages (including dictionaries, thesauruses and crossword puzzle books) after his dubious publications hit the virtual shelves, and after a small but vociferous group of linguists complained. The other article is about this mobile phone dictionary project that James and I are getting more and more involved in, and (very quickly) how this sort of project can prevent the theft of data in the first place.

I feel that the article on Philip Parker makes me look like a bit of a whinger. Here’s the operative quote:

Aidan Wilson, a Sydney University linguist who wrote an honours thesis on the Wagiman language spoken north-west of Katherine, said Professor Parker had used the wrong spelling on the cover of his publication Webster’s English To Wageman Crossword Puzzles: Level 1.

Yes; it’s true that Parker had the wrong spelling, but it’s clearly not the reason I’m annoyed at the publication of these books. I’m more annoyed that the entirety of information within them is publicly available at locations that properly explain the data, the language, and cite sources, while these dictionaries, thesauruses and crossword puzzle books omit all of this information. In short, they are lossy2 versions of dictionaries already freely available.

The article also makes it sound like we, speakers of indigenous communities and linguists working with them, have hindered the publication of useful educational resources due to our collective sensitivities. It doesn’t help the situation that Parker probably had his heart in the right place in wanting to further disseminate information relating to critically endangered languages.

A dyslexic, he collects lists of words and publishes dictionaries, thesauruses and crossword puzzles at a loss, he says, in the interests of education. His work has been heralded as a way to create paper resources for resource-starved Third World students.

That’s all well and good, but perfectly good materials already exist – those that the linguists have produced and made freely available in full consultation with the language community. It surely isn’t helpful to convert these into forms in which the information is distilled and compressed such that it no longer conforms to even the minimum standard required for the most basic dictionary. All information apart from the name of the language, the headword and a single gloss has been omitted. That truly is lossy. To give you an idea of what I mean, here’s an entry from the Online Wagiman Dictionary:

You can see that there are no less than 6 tiers of information here; a headword, part of speech, glosses divided into multiple senses, illustrative sentences, their glosses and importantly, the speaker responsible for that illustrative sentence, as well as related words. Parkers dictionary merely has this:

ngal-gawu-mang
grandmother
grandchild

I don’t think anyone could reasonably argue that the latter is more useful than the former, or even that it is good for it to be around in addition to the original. I would even go as far to say that its existence in this form is potentially harmful and outweighs any possible benefits of it as an educational resource.

There is another issue that stems from this that deserves attention. Suppose you found one of these dictionaries for a language you’ve never heard of. Let’s say it has some pretty extraordinary stuff in it and you’d like to know more, or even go to the sources and do some fact checking. How do you go about doing it? There’s no citations given anywhere, no examples have made it through the distillation process and no speakers are referenced. We’re in a different situation as we know the original is a good quality publication due to Stephen Wilson’s work, and can pretty much trust that the ‘distilled’ version will more or less be correct. But if Parker gave the same treatment to a highly dubious dictionary, Urban dictionary, let’s say, then the output looks just as authoritative as something that derived from a reputable source in the first place. This clearly makes it very difficult for readers of dictionaries to make informed decisions about the quality of what they’ve got.

I should reiterate that I think Parker had the best of intentions; to further disseminate information about as many languages as possible, something I naturally admire as a linguist. Yet he fails to recognise that lexicography is not easy work; it can’t be done just with a data-harvester, a spreadsheet and a bunch of automatically generated Amazon.com comments and reviews. It takes linguists and lexicographers years to compile the information and resources necessary to create dictionaries. Producing very low-quality dictionaries, thesauruses and crossword puzzle books of some 600 worldwide languages does nothing but undermine their efforts.

I’ve been back in Sydney for almost a week now, having been in Melbourne before that to attend the University of Melbourne Linguistics and Applied Linguistics Postgraduates Conference, where I presented the Kaurna Electronic Dictionary1 to a sell-out crowd. It was the final leg of an epic, two part world wind whirlwind tour that began in Wellington almost two weeks ago. (more…)

I didn’t get a chance to post this yesterday as I was too busy after the conference having dinner and ‘sampling’ New Zealand’s finest Monteith’s beers1, but I think the presentation was mostly a success.

I probably should have refined it a little more on Thursday night instead of heading to the pub and, yes, sampling more of New Zealand’s finest Monteith’s beers, because I think it was a little rushed and felt a bit underbaked, but aside from that I got the feeling that the reception was good. I didn’t leave any time for questions unfortunately, and after my talk were two more in the session, meaning people probably let it slip into their subconscious. Nonetheless, there has been some positive feedback.

The four plenary talks were all brilliant. Sarah Ogilvie took a historical look at the impact of James Murray, the first editor of the Oxford English Dictionary, and his understated willingness to be as inclusive to borrowed words as he could, despite some later revisionists’ assertions that he was too stubborn with including foreign words. Bruce Moore on the other hand, carpetted the Oxford’s more recent publications for sloppy antipodean citations, showing that many of the multiple citations for such obscure Australian and New Zealand word such as Old Thing for a dish of salted beef and unleaven bread, all derived from a single source, a wordlist of Australian words published in 1941 by Sidney Baker, yet the OED has listed them as separate pieces of evidence.

More relevant to my talk though, were two other talks yesterday on electronic dictionary systems. One was by Dave Moskowitz who developed the Freelex dictionary creation software for the adult monolingual Māori dictionary2, mostly because he didn’t want to do it all himself. Freelex, as its name might suggest, is free (as in both beer and speech) and open source, and it runs on a MySQL backend. The other talk was by Gilles-Maurice de Schryver who developed TshwaneLex, a commercial product that does a similar job, but which runs on a prorietary format at its backend, based on XML.

Each of those are in hugely more advanced stages of development that our humble XML-based multiple format dictionary project. Even so, the demonstration of the Kirrkirr Kaurna dictionary and the mobile phone dictionary, which I was able to run on the projector screen as an emulator, were absorbed by the audience with a great deal of interest; especially paying attention to the idea that mobile phones were just the obvious choice for housing dictionaries in some parts of the world. Such a system, for instance, would be perfect for Southern Africa, which has a similar internet situation to Northern Australia.

Among our many Monteith’s last night, we had a long discussion about some aspects of theoretical lexicography3 such as what purpose dictionaries are meant to serve. Several of the talks refered to dictionary users being put off by things such as labels, parts of speech, scientific names and so on. These talks mentioned ‘training’ the users how to get the most out of that dictionary. But another point of view, not necessarily my own, that was put forward last night was that it may be better to instead rebuild the dictionary so that it’s what the user wants and needs, rather than to persevere with a non-user-friendly dictionary that try to shoehorn the audience into it.

For instance, Julie Baillie gave a talk directly after mine, in which she presented Oxford’s new beginner’s wordlist, which uses corpus techniques to find the words most used by younger children, who are just beginning to read and write. The inspiration for her research, which culminated in the production of the Oxford Wordlist, was that children in primary school classes were learning to read and write using wordlists created in the 60s and 70s in Europe. They naturally involved concepts foreign to Australian and New Zealand kids abnd were for the most part useless for the kids to learn to read and write with. She compiled the wordlist by the frequency of these words as they appeared in small narratives written by children in target age groups, and therefore better reflect those children’s worldviews. So, she has rebuilt the dictionary to suit the needs of the user, rather than force the user to conform their needs to the functions of the dictionary.

Brilliant.

Anyway, that’s one conference down, one to go. I’m off to Melbourne next week for the Unimelb postgrad conference, and perhaps also to discuss the possibility of doing a PhD there beginning in 2010.

These Monteith’s Brewery beers are fantastic, mostly. Unless you like cider you can give the Summer Ale a miss, and the Raddler Ale is pretty much like a shandy. By far the best is Original Ale, whose closest Australian analogue would have to be Squire’s Amber Ale. Following closely behind is the Pilsener.

You can tell that I’ve been busy in research this week.

Which reminds me, I really want to find a copy of a good Māori dictionary before I leave

Furthermore to presenting the Kaurna electronic dictionaries at Australex next week, we’ve been invited to give a talk at the University of Melbourne Linguistics & Applied Linguistics Postgraduate Conference 2008, held November 21-22. It’s a great excuse for me to finally visit Melbourne for the first time in… about 13 years.

Then, this morning, we received confirmation that our abstract has been accepted for the 1st International Conference on Language Documentation and Conservation in Honolulu, Hawai’i in March next year! By which time we should both be well and truly stuck into our next phase of the project, being generously supported by a grant from the Hoffman foundation, which you can read about here.

Unfortunately for me, March next year is during the teaching period meaning I won’t be able to attend. But hopefully James will be free then and will present our project to a wider audience.

As I promised last week, I’ve managed to find a copy of the SBS World News report in which I appeared, that mentions and demonstrates the mobile phone dictionary – thanks to Jeremy who recorded it – and so I’ve put it up here.

Just bear in mind that I had no idea that I was going to be interviewed, which is why I’m unshaven and wearing – ahem – a Transformers T-shirt (Decepticons, no less).

I suppose this destroys for good any semblance of internet anonymity that I had feigned.

<UPDATE>
As Michael noticed, I think the large video file was causing some strife for the company that generously hosts this site, Affernet, so I’ve YouTubed it instead.
</UPDATE>

If you’re in Australia, tune in to SBS World News tonight either tomorrow or Sunday night [I just got a call from them; they’ve bumped it back to the weekend] at 6:30pm. I have a feeling that there’ll be an interesting report on indigenous languages in Australia, and the use of modern technology (such as electronic dictionaries and mobile phones) in their revitalisation.

A few weeks ago I mentioned that a bunch of us at Sydney Uni had submitted an abstract for a conference presentation of the Kaurna electronic dictionary.

Just recently, we received the news that our abstract has been accepted. So, if you’re planning on coming along to Australex ’08 at the Victoria University of Wellington in November and you’d like to see the public unveiling of our Kirrkirr and mobile phone dictionaries, then by all means look out for us – by which I mean me.

As it’s been about a month since my last post, it’s probably about time I posted something at least to ensure that this site doesn’t get referred to as a ‘dead blog’. To make matters worse, not only have I not been posting, I’ve also been neglecting my reciprocal blogger duties of reading other people’s work, which I hope is a good indicator of how busy I’ve been. Reading through the myriad of blogs in my feed reader is normally one of my most favoured activities.

So what is my excuse then?

The same old story really — work. But this time the various jobs are a little different. Besides my regular duties as audio engineer at Paradisec and my unrelenting duties as tutor of first-year linguistics, I have been preparing a grant application with a colleague to continue our work developing electronic dictionaries of minority languages, including dictionaries available as java applications on your mobile phone1.

We have also been preparing several papers, conference talks, seminars and so on to detail our project and our process of producing visually-rich multimedia electronic dictionaries from basic wordlists. There are a couple of conferences later in the year that this sort of thing would be perfect for, but we also plan to get a paper sent off to some prestigious lexicography journal somewhere.

As a teaser, here’s an abstract that we sent off to one such conference earlier this month:

Kaurna is the indigenous Australian language of Adelaide and the Adelaide Plains. It has not been actively used since 1929, when the last native speaker died. More recently, efforts have been undertaken to restore Kaurna to a state of community use. One recent project involved the creation of an electronic Kaurna dictionary carried out by a team at the University of Sydney during the first half of 2008. As this was a community-driven project, it had certain requirements, such as the need to archivally preserve the two main documentary sources of Kaurna: a book published in 1840, and a hand-written manuscript from 1857.

In an effort to maximise flexibility, portability and transparency, the Kaurna dictionary project opted for an XML formatted master dictionary that could then be converted to other formats, such as an HTML web-page, or even a printed dictionary. The current means of presentation is through Kirrkirr, a multimedia-rich dictionary visualisation tool.

In this project we also developed software for presenting the dictionary on mobile phones. Mobile phones are almost ubiquitous today and most modern mobile phones have the memory capacity and features necessary for storing and presenting the dictionary content. They therefore present an excellent opportunity for learners of minority languages to have access to a dictionary. The mobile phone dictionary software is currently in its early stages, but we hope to improve it with further work and make it available to people compiling electronic dictionaries for other languages.

I’ll let you know how it all goes.

You can read all about this project, which began with Kaurna, at a post of mine here, and at James’ post here. James’ post also includes example software for download, in case you want to try any of this out.

I received this piece of concise, witty and rather insightful spam in the inbox of my work address this morning, and as it amused me I thought I’d share it.

From: Monte CunninghamSubject: best

your life is crap

Mr Cunningham clearly doesn’t know what he’s talking about. I’ve been so busy between tutoring, compiling electronic dictionaries and my regular work, that I’ve barely had a chance to blog over the past couple of months, but I’ve enjoyed every last bit.

We’ve pretty nearly finished the dictionary, and the teaching semester wraps up soon, but I am soon to be involved in producing a similar electronic dictionary for another Australian language; this time from Sydney. So things show no sign of slowing.

Does anyone know how something that literally translates as ‘dislike of hand’ could be paraphrasable as ‘liberal’?

In the data massaging for the Kaurna electronic dictionary that I spoke of back here, I’ve come across a term whose internal parts indeed come from the words for ‘dislike’ and ‘hand’. I’m certain it’s not a typo, but the definition given is:

Dislike of hand, i.e. liberal

Come on, folks. Put your historical lexicography hats on.

In case it helps, it was written in 1857.

<update>

Something else I just noticed. Murta is the Kaurna word for animal faeces, which has a sub-entry murtaannaitya, which is glossed as ‘European hen’.