Last week, I undertook a brief fieldtrip to Pine Creek and Kybrook Farm, Northern Territory, to present the completed Wagiman Electronic Dictionary to the Wagiman community.

It has been a long time coming as several of us have been working on this dictionary in our spare time for the last six months, and so it felt especially good to be able to see a finished product, and better yet, to give it back to the community. In that six months, we successfully integrated recent research into Wagiman plants and animal species by Glenn Wightman, as well as very recent work done by the CSIRO on fish species in the Daly River. The electronic dictionary now contains all that up-to-date information. We also managed to produce sound files for the majority of lexical entries in the dictionary. There are around 1250 sound files in the dictionary altogether, totalling some 15 minutes of high-quality audio.

Lardukkarl nganing-gin using the Wagiman mobile phone dictionary

The Wagiman community are very pleased with the dictionary, and all enjoyed listening to the marluga¹ who recorded each of the sounds. The Wagiman people were also excited to see the mobile phone version of the dictionary. It’s not quite as complete as the computer based dictionary; it contains far fewer sound files (around 300), and doesn’t contain the sometimes lengthy dictionary comments that accompany many lexical entries. This is an unfortunate constraint of the size of a standard mobile phone screen — too much information can be hard to navigate through.

I also met with representatives of the Northern Territory Department of Education, who were interested in supporting the dictionary and possible collaboration into the future. The Wagiman have given the tick, and the Department are going to go ahead and install the dictionary on all the computers in the schools in Katherine as a first step. We’re hoping that we’ll also be able to get the Northern Territory Library on our side and install the dictionaries on library computers. That way, most computers accessed by children and young adults in the area will have the Wagiman dictionary installed.

In addition to the computer- and mobile phone-based dictionaries, we have also been looking to produce a printed version. Hopefully the Wagiman community will be able to take advantage of the increased interest in Indigenous languages recently, and sell copies of the dictionary to tourists through various shops in Katherine, Pine Creek and Darwin.

Perhaps the most important thing to come out of this particular project is the demonstration that accessible electronic dictionaries for Indigenous languages can be produced for relatively little extra effort, provided that the language in question has been adequately described. Although for many languages, this remains a significant obstacle.

The Wagiman people have given us permission to allow the public to download a demonstration version of the Kirrkirr dictionary, which we will try to have ready soon. A full version will be available upon request to the Wagiman community.

In the last few weeks, the topic of bilingual education in Australia has been receiving a fair amount of coverage in the mainstream media. Last week, I happened upon an article in the Herald, echoing earlier reports in voicing the widespread opposition from educators and academics towards the Northern Territory government’s policy of English-only education for the first four hours (leaving only a single hour of tuition) each day. The article quotes Patrick McConvell, co-author of the AIATSIS discussion paper1 that effectively brought the debate to the forefront of Australian politics.

The coverage of this issue continues tonight at 8:30 (EST) on ABC1, as Four Corners looks at the history of bilingual education in remote Australia, which they also covered way back in 1986, and dissects the policy decision by Marion Scrymgour in October 2008, before she quit her portfolio as Minister for Education. Our very own expert in this field, Dr Jane Simpson, was interviewed for the program several weeks ago, so I suggest watching it.

In other news altogether, I have finally had my honours thesis published online in The University of Sydney’s eScholarship repository. It was just under three years since it was marked in October 2006, but better late than never! You can access the pdf version here2.

For the past couple of weeks I’ve been working my way through my several hours of Wagiman recordings from my recent fieldtrip, all the time remarking at how excellent they are. It’s a combination of a good recording device; a Roland Edirol R-4, a great microphone with a proven track record in the field; a Røde NT41, and experience in microphone placement and input gain control2. I’m finding the best tokens of all the words I recorded for eventual insertion into the electronic versions of the Wagiman dictionary, including a Kirrkirr instance, and a mobile phone dictionary.

Splitting the recordings into some 1500 individual sound files is a time-consuming occupation, and unfortunately, as it’s the only one of my many jobs that isn’t actually paying me anything, higher priority tasks often win out.

Eventually though, we’ll have a Wagiman electronic dictionary ready for distribution, and a down-sampled version of the same ready for installation on mobile phones. So keep posted!

Gain control was really key in the end, as it was raining most of the time,which would cause low-level hiss if the gain were set too high. Luckily my speaker didn’t mind talking directly and loudly into the microphone, so I was able to keep the gain right down to stop too much ambient noise getting in.

Well, my time in the Territory has come to an end, almost. I’m sitting in Darwin airport waiting for my flight. Not a lot to do in Darwin, so I pretty much came straight here after getting dinner in town. Luckily, I stumbled upon an ethernet port that was obviously for one of those airport internet kiosks – the ones that charge 2 bucks per 8 minutes – that the airport has evidently neglected to disable, meaning I have free broadband internet for the first time in a month!

I’ve got plenty of time to make use of it too; my flight isn’t for another 4 hours1. I intended to studiously listen to my recordings and split them into individual sound files, one per word, for eventual insertion into the Wagiman Electronic Dictionary, but catching up on old email correspondences, reading old xckd comics and Language Log posts and downloading the latest Herald cryptic crossword file have sadly taken priority.

My work up here slowed down a little lately, owing to a bunch of meetings in the community this week, and the fact that my informant and I have been getting a little tired of covering tthe same territory. I actually got caught short this week and didn’t get to finish off the checking of the dictionary content, but I’ll be able to do some final checks the next time I’m up here, probably in the middle of the year2.

As far as the dictionary goes, it’s progressing nicely. I’ve been able to make some additions, and get rid of some words that were always dubious. The more recent ethnobiology research from Glenn Wightmann will need to be integrated at some stage, but I can do that from Sydney. The software for mobile phone dictionaries is also going steadily, and you can read all, or mostly, about that at pfed.info, the website we’ve created for this project. Demo dictionaries can be downloaded or tested online at pfed.info/wksite, although it’s all still in its infancy.

The reaction to the mobile phone dictionary that I’ve been showing off up here has pretty much been universally positive. Everyone I’ve shown it to has been interested in it, even the adults in the community, although the teenagers took a particular liking to it. Not only does this stand to reason, but it bodes well for what we’re actually trying to achieve with this project; increased access to a dictionary of one’s language in a format that’s easy to use. I haven’t wasted any time in showing it to the linguists up here and they too have shown interest, so much in fact that we’ve gone on to wunderkam3 dictionaries of a further two languages: Dalabon and Bilinarra.

We have a couple of other ideas up our collective sleeve that would potentially aid in the wider use of electronic dictionaries of minority languages, but I don’t want to give anything away just yet4.

Actually it’s only 3 by now, such is the time it takes me to write a post these days.

So that I can escape the bitterst of Sydney’s winter, as well as having inadvertently escaped the worst of summer this time around.

This is a backformation from Wunderkammer, the name that James came up with to cover the mobile phone dictionary software. So, what else does a Wunderkammer do if it doesn’t wunderkam? My intended meaning for this word is ‘to convert a dictionary into a mobile phone-ready format’. I felt I needed a new word, since a default ‘do’ would imply that we had a hand in producing the content, which would clearly detract from the hard work of the researchers, language workers and speakers.

More accurately, I don’t want to promise anything that real-world constraints, such as computational impossibility or pecuniary limitations, would prevent me from being able to deliver, but ‘not spoiling the show’ sounds much better.

I can now confirm that I’ll be back in the territory in a little over a week’s time. It’s my first time back there in over 18 months, and it’ll be my first experience of a Northern Territory wet season, so I can’t wait.

The reason I’m going is to do some work for the electronic dictionary of Wagiman that James and I are producing, including a mobile phone version, using generously donated funds from the Hoffman Foundation. I’ll just be going over the revisions that need to be made to the current dictionary, record sounds and possibly take photos for inclusion into the dictionary, and discuss with the community how they’d like it to work.

For one thing, there are plenty of words that I know the older speakers don’t particularly want the younger kids to know about, so I’m guessing they’ll want such words ‘hidden’ from the kids’ version of the dictionary. However as James pointed out to me, the first words younger kids look up in dictionaries are swear words and taboo body parts, and having them there for them to gawk over provides a means with which the kids can relate to the dictionary matter.

Also, we’ve decided that it’s about time to set up a website and blog for the project, except we haven’t yet got around to installing the wordpress software. The site will contain information relating to the project, new releases of software, instructions on how to convert toolbox databases into other formats, and extensive documentation of the whole process.

A couple of months ago, I received a phonecall from a journalist from the Herald, who’d seen my appearance on SBS World News, and was interested in writing an article about the mobile phone dictionary project.

A few things have happened between then and now, including conferences, holidays and a didjeridu performance by Nicole Kidman on German TV that seems to have absorbed all local interest in indigenous affairs for a few days1, but on Friday morning, two articles appeared in the front page section of the Herald, based in part on an interview I gave a little while back.

The main article is about Phil Parker, the marketing guru who’s recently delisted his ‘books’ on Australian languages (including dictionaries, thesauruses and crossword puzzle books) after his dubious publications hit the virtual shelves, and after a small but vociferous group of linguists complained. The other article is about this mobile phone dictionary project that James and I are getting more and more involved in, and (very quickly) how this sort of project can prevent the theft of data in the first place.

I feel that the article on Philip Parker makes me look like a bit of a whinger. Here’s the operative quote:

Aidan Wilson, a Sydney University linguist who wrote an honours thesis on the Wagiman language spoken north-west of Katherine, said Professor Parker had used the wrong spelling on the cover of his publication Webster’s English To Wageman Crossword Puzzles: Level 1.

Yes; it’s true that Parker had the wrong spelling, but it’s clearly not the reason I’m annoyed at the publication of these books. I’m more annoyed that the entirety of information within them is publicly available at locations that properly explain the data, the language, and cite sources, while these dictionaries, thesauruses and crossword puzzle books omit all of this information. In short, they are lossy2 versions of dictionaries already freely available.

The article also makes it sound like we, speakers of indigenous communities and linguists working with them, have hindered the publication of useful educational resources due to our collective sensitivities. It doesn’t help the situation that Parker probably had his heart in the right place in wanting to further disseminate information relating to critically endangered languages.

A dyslexic, he collects lists of words and publishes dictionaries, thesauruses and crossword puzzles at a loss, he says, in the interests of education. His work has been heralded as a way to create paper resources for resource-starved Third World students.

That’s all well and good, but perfectly good materials already exist – those that the linguists have produced and made freely available in full consultation with the language community. It surely isn’t helpful to convert these into forms in which the information is distilled and compressed such that it no longer conforms to even the minimum standard required for the most basic dictionary. All information apart from the name of the language, the headword and a single gloss has been omitted. That truly is lossy. To give you an idea of what I mean, here’s an entry from the Online Wagiman Dictionary:

You can see that there are no less than 6 tiers of information here; a headword, part of speech, glosses divided into multiple senses, illustrative sentences, their glosses and importantly, the speaker responsible for that illustrative sentence, as well as related words. Parkers dictionary merely has this:

ngal-gawu-mang
grandmother
grandchild

I don’t think anyone could reasonably argue that the latter is more useful than the former, or even that it is good for it to be around in addition to the original. I would even go as far to say that its existence in this form is potentially harmful and outweighs any possible benefits of it as an educational resource.

There is another issue that stems from this that deserves attention. Suppose you found one of these dictionaries for a language you’ve never heard of. Let’s say it has some pretty extraordinary stuff in it and you’d like to know more, or even go to the sources and do some fact checking. How do you go about doing it? There’s no citations given anywhere, no examples have made it through the distillation process and no speakers are referenced. We’re in a different situation as we know the original is a good quality publication due to Stephen Wilson’s work, and can pretty much trust that the ‘distilled’ version will more or less be correct. But if Parker gave the same treatment to a highly dubious dictionary, Urban dictionary, let’s say, then the output looks just as authoritative as something that derived from a reputable source in the first place. This clearly makes it very difficult for readers of dictionaries to make informed decisions about the quality of what they’ve got.

I should reiterate that I think Parker had the best of intentions; to further disseminate information about as many languages as possible, something I naturally admire as a linguist. Yet he fails to recognise that lexicography is not easy work; it can’t be done just with a data-harvester, a spreadsheet and a bunch of automatically generated Amazon.com comments and reviews. It takes linguists and lexicographers years to compile the information and resources necessary to create dictionaries. Producing very low-quality dictionaries, thesauruses and crossword puzzle books of some 600 worldwide languages does nothing but undermine their efforts.

When collecting field recordings, always, always begin each audio file with a little blurb mentioning the date, the location, who’s present, and what language is being researched. It’ll cost you about 10 seconds of each recording and you’ll sound like a bit of a tool repeating yourself, but you’ll save yourself hours of work years later when you (finally) get around to archiving your recordings and you need to find all this information from other sources, like airline booking confirmation emails.

Oh, and transcribe your recordings while they’re fresh in your head, lest you find yourself devoting countless hours of unpaid work to do so when you have a brazillion1 other things to do.

I’m alluding to a George W. Bush joke here:
One of the president’s advisers rushes into the oval office and tells the president that there’s been a terrorist attack in Rio and that 2 Brazilians have been killed.
“Oh my God!” Screams the president, to the astonishment of the advisor, who didn’t think the death of a mere 2 people would have fazed the president so much. “How many are in a brazillion?”

A few posts back, I wrote about a book that David Nash had found on Amazon.com, which appeared to be a bi-directional crossword-puzzle book between English and Wageman [sic1]. It seemed as though these books, and a few others on Amazon on Wageman, contained the very same wordlist collected by a previous researcher and published under copyright at AIATSIS.

This is by no means an isolated incident. Parker has wordlists for around 600 languages stored online, and could potentially create crossword books, dictionaries and thesauri for each of them. See also Peter Austin’s post at Transient Languages and Cultures regarding a similar thing having happened to the Kamilaroi/Gamilaraay dictionary.

Instead of letting this issue slide into the obscurity of my Mabitjbaran, or Archives, I bought a copy of each, English to Wageman and Wageman to English, and have made contact with the ‘author’, Philip M. Parker, to solicit his explanation of what appears to be a blatant violation of copyright restrictions.

First thing’s first though. The books actually appear to be a pretty good educational resource, assuming that the school in Pine Creek is up to the point of recommencing its Wagiman language programs, of which I’ve only ever seen fleeting bits of evidence of ever having taken place2. The books comprise probably hundreds of automatically generated crosswords with the solution words in alphabetical order at the bottom. In spite of the books’ copyright restrictions by their supposed author, I’ve scanned a page of one of these books, which you can view here.

I’ve also done a little more background research on the author of these books, Philip M. Parker, and as it turns out, he’s not at all involved with dictionary compiling, language work or language education. In actual fact, he’s a professor of marketing and a generic entrepreneur at the Singapore campus of an international private business and marketing college based in France, called INSEAD. He even has a biography page on Wikipedia, which is interesting to this topic, as it goes into detail about his book publishing career. Apparently he’s quite famous in the marketing and entrepreneurial world.

His fame derives from the fact that he has developed a process that automatically produces and prints books on demand, with little or no interactive work. Each book that gets printed costs him an estimated 12 pence Sterling. So good is his software apparently that he has authored 85,764 books on sale at Amazon.com.

Parker estimates that it costs him about 12p to write a book, with, perhaps, not much difference in quality from what a competent wordsmith or an MBA might produce.

Nothing but the title need actually exist until somebody orders a copy. At that point, a computer assembles the book’s content and prints up a single copy.

Not much difference in quality from what a competent3 wordsmith might produce? If you check a random selection of some of these books, you’d be forgiven in not seeing what sort of quality he’s referring to:

The 2007-2012 Outlook for Tufted Washable Scatter Rugs, Bathmats, and Sets That Measure 6-Feet by 9-Feet or Smaller in India

Riveting. And that costs US$495.00, in case you were wondering.

What Parker does is harvest data, irrespective of what sort of data it is, and churns out books with it. It doesn’t matter if no one’s interested in the statistical prognostications for the Indian mid-sized bathmat industry, because each book is printed if and only if someone actually orders it; a copy may never actually exist. But considering there are libraries around the world that will buy a copy of each and every publication under the sun, Parker is probably earning a lot of money.

As I mentioned at the start, I’ve made contact with Parker and courteously attempted to solicit some information, such as which wordlist he used, and whether there were any copyright protections on that data. This is the response I got back:

Thank you for your concern; there are no copyright violations. Please feel free to copy my puzzles for your teaching4.

p.s. translations of words, themselves, cannot hold copyright, only the format in which they are presented (translations of single words are public knowledge; translations of creative works are not). I will later be doing anagrams, poems, rhyming sections, etc.. java-based web games (free to use), etc.

I felt a little confused by this response; I’m not very knowledgeable about copyright law and would have expected that someone’s research and work would be protected under copyright. At the same time though, I’m sure that Parker has done his legal research and knows full well what he can and cannot do. Peter Austin has a legal advantage over me in this respect; his Gamilaraay dictionary included some reconstructions:

It is not possible to copyright common knowledge such as words and meanings. Unfortunately for Parker, some of the quoted forms, like muRumuRu on page 11 are creative works since they are reconstitutions which I have posited on the basis of 19th century published and unpublished amateur recordings (as explained in the preface of my dictionaries — note that the orthographic R is not a Gamilaraay sound but a cover term for where I could not determine whether the source represented a flap rr or a continuant r). Now that is copying of creative work without attribution, in my view.

It may turn out to be a little more difficult to demonstrate some ‘creative work’ with the Wagiman dictionary, and we may just have to accept that legally, this sort of blatant plagiarism will be allowed to continue.

Let my warning be this: If you find a book written by Philip M. Parker that looks interesting, avoid it; you can probably find the content online for free.

We spell it Wagiman these days. Wageman was the spelling adopted by earlier researchers, Ethnologue and AIATSIS. Phonetically speaking, I couldn’t judge either way. For ease of fact-checking, I’ll retain the spelling used in the books.

Over the weekend, David Nash drew my attention to a book that he found on Amazon, that purported to contain bilingual crosswords puzzles in English and Wageman1.

I was a bit perlexed by this, since, well, Wagiman doesn’t have much in the way of practical applications such as second-language learning, that is, of course, beyond the community of Wagiman people. It should be noted at this point though, that this book is not being marketed towards the small community of non-Wagiman speaking Wagiman people, but to a North American audience.

The book is published by a mob called Webster’s Online Dictionary, who I take to have no connection whatsoever to Merriam-Websters, given the look of their respective websites. Theirs appears to contain worldlists of hundreds and hundreds of languages, many of them minority languages, and it seems some of them have been converted to print, albeit in the bizarre form of bidirectional crossword puzzle books.

Here is the product description, as supplied by Amazon, and likely supplied by Philip M. Parker, the person behind Webster’s Online Dictionary:

Webster’s Crossword Puzzles are edited for three audiences. The first audience consists of students who are actively building their vocabularies in either Wageman or English in order to take foreign service, translation certification, Advanced Placement® (AP®) or similar examinations. By enjoying crossword puzzles, the reader can enrich their vocabulary in anticipation of an examination in either Wageman or English.

The second includes Wageman-speaking students enrolled in an English Language Program (ELP), an English as a Foreign Language (EFL) program, an English as a Second Language Program (ESL), or in a TOEFL® or TOEIC® preparation program.The third audience includes English-speaking students enrolled in bilingual education programs or Wageman speakers enrolled in English speaking schools.

EFL, ESL, TOEFL or TOEIC programs being run anywhere near Wagiman country? Really?

However, I can see in this book a benefit for some eventual teaching of Wagiman language in the local school, to help increase literacy in Wagiman, but unfortunately, the book uses an outdated orthography and may actually undermine increased Wagiman literacy efforts.

I wouldn’t want to financially support someone who – it appears – has taken a wordlist published in the public domain2 and has created something proprietary, like a book, with the goal of profit in mind, but I think I might still have to have a Wagiman-English crossword puzzle book on my shelf, just for the fun of it.

Wageman was one of the variant spellings. Others include Wakiman (Cook, Austin) and Wogeman (Tyron).

I find it ironic, furthermore, that while the original wordlist was a public domain web-publication, Webster’s Online Dictionaryprohibits automatic harvesting of any of their data. I doubt that they copy-pasted each and every entry from the wordlist.

Not long ago, I received a call from a friend in Kybrook Farm. She informed me that an old lady, one of the last remaining Wagiman speakers, had died a little while earlier.

I’ve never experienced the death of a language informant before. I can only describe in that it feels exceedingly bitter to know that in addition to the pain of losing a friend, that such a death represents another irreversible step towards the loss of one of the world’s unique languages.

Of all the speakers, she was the best to work with. She was a warm and friendly woman who really enjoyed a laugh and would gladly speak to me for hours, selflessly helping me learn her language.