A blog about translation, language, art and technology

Main menu

Monthly Archives: January 2013

Nothing is quite so fun as language games – scrabble, find a word, and limericks are all examples of just how deep our fascination with word play really is.

Then, in the late 1980s through to the early 2000s there was a flurry of books, and later internet sites, about poor translations to English, especially as seen in Japan and China or on products made in those countries, the pinnacle being All Your Base are Belong To Us. This phenomena is sometimes distastefully named chinglish or japlish or engrish.

More recently language and translations as entertainment and art have blossomed with the internet’s reach and access at such a level that would boggle anyone 20 years ago.

The Reconstructionists, a collaboration between illustrator Lisa Congdon and writer Maria Popova, is a yearlong celebration of remarkable women — beloved artists, writers, and scientists, as well as notable unsung heroes — who have changed the way we define ourselves as a culture and live our lives as individuals of any gender.

Every Monday in 2013, we’ll be publishing an illustrated portrait of one such trailblazing woman, along with a hand-lettered quote that captures her spirit and a short micro-essay about her life and legacy.

The project borrows its title from Anaïs Nin, one of the 52 female icons, who wrote of “woman’s role in the reconstruction of the world” in a poetic 1944 diary entry — a sentiment that encapsulates the heart of what this undertaking is about: women who have reconstructed, in ways big and small, famous and infamous, timeless and timely, our understanding of ourselves, the world, and our place in it.

With all the recent activity on the list, you must be aware that OmegaT 2.6 now offers the ability to easily work in teams over the internet.
The function has been discussed at length here and is also very clearly detailed in blog posts written by 2 very active members of the OmegaT community:

As you can see, the SVN/GIT server setting is the hardest part, including the fact that it is not trivial to find free and professional SVN server hosting services.

So, let me inform you that Didier (in fact Didier Briel Consulting and PnS Concept) is offering all the OmegaT l10n teams (ie languages where 2 or more people work on the localization) professional grade hosting for free with unlimited bandwidth.

The French localization team has been using the service for a few months now and it works like a breeze.

I strongly suggest that all the teams move to such a system because it tremendously eases the translation process when a number of people are involved.

Note that this is only for those translating the OmegaT software itself – but is an interesting business in general – surely there is room in the market for such a service?

Before Digg and Reddit existed, similar offerings were available from MetaFilter (MeFi) and SomethingAwful. I long ago signed up for Digg and Reddit, but for some reason I never really got the hang of MeFi – until recently.

I joined a week or so ago, and I’m pretty impressed so far. Here are a few example of stuff that I’ve found just yesterday:

MeFi’s Learn Korean Easy (Oh, the grammar!) reposts artist/adventurer Ryan Estrada’s great comic called Learn to read Korean in 15 minutes which is fascinating. An internet holes opens up as I go searching for more information on Hangul, the origin of Hangul and it’s promulgator Sejong the Great. I know what I’ll be doing on my next interminable wait at an airport, which from the comments seems to be the place most people like to learn the phonetic alphabet system.

I don’t like blogging like this, but it’s hard to find the time with an intermittent Internet. I find titbits, but I rarely follow links – I’ve not watched an online video in almost a year and my inbox has an email thread containing 276 emails with over 400 links to “revisit” once I return to the land of faster bandwidth. As though anyone on the Internet has time for 400+ old links.

However as someone that is interested in language, it behooves me to relay this content that I’ve found.

I don’t know why I have a low opinion of Will Self, but I do. As a self important anarchist I think that I rub up against other self important *ists. Despite this I found his latest piece for the BBC, In defence of obscure words, a rollicking good skewering of the stupid, the vapid, the empty. Be it expressing a love of words and language and using them:

I’d point out that my texts were as full of resolutely Anglo-Saxon slang as they were the flowery and the Latinate. I’d observe that English, being a mishmash of several different languages, had a large and exciting vocabulary, and that it seemed a shame not to use it – especially given that it went on growing all the time, spawning argot and specialist terminology as freely as an oyster does its milt.

or the end result of a culture built by the risk adverse:

But now that all formerly difficult subject matter is, if not exactly permitted, readily accessible, cultural artificers have no need to aim high. The displacement of aesthetically and intellectually difficult art as the zenith has resulted in all sorts of sad and interrelated phenomena.

In the literary world, books intended for child readers are repackaged and sold to kidult ones, while even notionally highbrow arbiters – such as Booker judges – are obsessed by that nauseous confection “a jolly good read”. That Shakespeare remains our national writer is, frankly, bizarre, given that with his recondite vocabulary, myriad historical references, and convoluted metaphorical language, were he to be seeking publication in the current milieu, his sonnets and plays would undoubtedly also be branded as ‘too difficult’.

As for visual arts, the current Damien Hirst retrospective at Tate Modern is a perfect opportunity to see what becomes of an artificer whose impulse towards difficult subject matter was unsupported by any capacity for hard cogitation or challenging artistry. The early works – the stuffed animals and fly-bedizened carcasses – retain a certain – albeit recherché – shock value, while the subsequent ones degenerate steadily to the condition of knocked-off merchandise, making the barrier between the gift shop and the exhibition space evaporate in a puff of consumerism.

But the most disturbing result of this retreat from the difficult is to be found in arts and humanities education, where the traditional set texts are now chopped up into boneless nuggets of McKnowledge, and students are encouraged to do their research – such as it is – on the web.

I quite enjoyed the brief moment of intellectual challenge that he poses.

Which is why I now turn to more a phenomena that really only exists because of the Internet but grew from the old style newsprint tropes “Word of the day”, maybe combined with “What in the world” – the longer form list of obscure, obtuse, unused, hard to translate or extinct words. Usually in groups of five, eight or ten. I’m not immune to posting links these lists here on Pineapple Donut, but it’s not often that it’s done anew – as an infographic and without the pronunciation of the words. And to stick it up to Mr Self, I found it though the most internet of ways – in RSS from a tumblr called this isn’t happiness, via mentalfloss, and then PopSci, to the original artist’s site, 21 Emotions with No English Word Equivalents.

At first I was put off by the filter of emotive words, but I came around as I thought about it – not only was Pei-Ying’s choice considered in that it provided a focus that’s easy to explain, empathise with and understand, but it gave her the opportunity to explore feelings that don’t have words in English, or any other language presumably, but are unique and identifiable to the (ahem, current) internet age. Unfortunately the artist’s site was so popular after the various postings that their broadband limit has been blown, or 509’d in tech speak.

Cory Doctorow fires up more passion in people than I’d expect – I find him interesting, intelligent and sometimes even enthralling, but the argy bargy that follows him is hard for me to comprehend. He writes for the Guardian on the difference between value and price in the internet era, largely focusing on positive externalities and their exploitation. Most interesting to me is his use of Google and it’s approach to translating.

A positive externality arises when you do something you want to do that also makes life better for someone else. For example, if you drive your car slowly and carefully to avoid a wreck, a positive externality is that other users of the road have a safer time of it, too. If you keep up your front garden because it pleases you, your neighbours get the positive externality of slightly buoyed-up property values from living on a nicely kept street.

Positive externalities — virtuous cycles — are all around us. Your kid learns to speak because of all the people around her who carry on conversations and because of the TV shows and radio programmes where speaking occurs (as do immigrants like my grandmother, whose English fluency owes much to daytime TV after she came to Canada from Russia).
…
Google is a case-study in harvesting positive externalities. It offered a free, voice-based directory assistance number, and used the interactions users had with its software to build a corpus of common phrases, expressed in multiple accents and under a wide range of field conditions. Then it used this to train the voice-recognition software that powers its Android-based phone-search. Likewise, it mined all the publicly available translations on the web – EU documents that appeared in multiple languages, fan-based translations for subtitles on cult cartoons, and everything else it could find – and used this to train its automated translation engine, providing it with the context that it needed to figure out the nuance and sense of ambiguous phrases.

He contends that the defining mania of the internet era is

resentment over positive externalities. Many people and companies have concluded that if someone, somewhere, is getting value from their labour, that they should get a cut of that value… Many people have accused Google of “ripping off” the public by indexing content, or analysing it, or both. Jaron Lanier recently accused Google of misappropriating translators’ labour by using online translated documents as a training set for its machine-translation engine – an extreme version of many labour-oriented critiques of online business.

leading to

the infectious idea of internalising externalities turns its victims into grasping, would-be rentiers. You translate a document because you need it in two languages. I come along and use those translations to teach a computer something about context. You tell me I owe you a slice of all the revenue my software generates. That’s just crazy. It’s like saying that someone who figures out how to recycle the rubbish you set out at the kerb should give you a piece of their earnings. Harvesting positive externalities involves collecting billions of minute shreds of residual value – snippets of discarded string –and balling them up into something big and useful.

While I enjoy his take, either he or Lanier has missed the mark. If Lanier’s critique was purely about the Google Translation Toolkit it would be understandable, but as is pointed out in the comments – the EU have made the translations available for exactly that purpose. Similarly, all the Free and Open Source software translation files have been there in the public domain waiting to be harvested since the movement started in the early 1990s – it was just a matter of someone thinking to harvest the files, and having the hardware and technical expertise to do so. And indeed, those files remain open source – someone else is welcome to harvest the same files. Google hasn’t locked them up. The Translation service on the other hand, asking for Translator’s Translation Memories and storing them – that is taking other people’s work. I guess the question then becomes can Google guarantee that they haven’t used those TMs in their translation service.

Behind every language, there is a grammar that determines its structure.

This article explains grammars and common notations for grammars, such as Backus-Naur Form (BNF), Extended Backus-Naur Form (EBNF) and regular extensions to BNF.

The discussion on context sensitive grammars and parsing is poorly explained to my mind, in need of more explanation and the article in general could be more interesting to the non computer scientist with a little more work. A primer only really.

Of course, they have a tag for the bestof2012 as curated by the members – anyone can use the tag, so there’s lots of interesting opinions from around the world.

I’m also looking forward to the Jon Spencer Blues Explosion live at (our favourite American radio station) WFMU (“Woof-Moo”) gig, and finally “katya-oddio: Late ’70s to Mid ’80s Heroes, which includes some doozies from Superchunk, The Units, Gang of Four, The Damned, Half Japanese, Paul Westerberg, David Byrne, The Scientists – look, it’s free right – just go and get it already.