Things worth knowing

Thursday, 29 September 2011

Yes, I thought it was odd too. Apparently, according to this BBC article, lots of kiddiwinks (over 50% in socially deprived areas) start school without having developed the ability to speak in 'long sentences'. This is a slightly vague term, but the article claims that a class of 5- and 6-year olds took six attempts to unjumble this sentence:

past the walked we shops

I wouldn't even call that a long sentence, so if this is true it's a bit worrying. The children, claims Wendy Lee from the Communication Trust, are only using short phrases and single words, and say things more typical of much younger children, such as:

went shops

The school in question is in Wythenshawe in South Manchester, but this applies to any area where there are high levels of social deprivation. Essentially, the kids aren't being talked to at home so they aren't developing language skills at the same rate as more well-off children.

So what about this no pens thing? The school, along with 99 others, is having a No Pens Day to try to encourage greater use of longer sentences. Sounds counter-intuitive, getting children not to write, but when you think about it, it makes sense. They're only young so when they're writing they're not using long sentences. And if they can't do it in speech, it's unreasonable to expect them to write lengthy accounts of a shopping trip. Instead, all the lessons on that day are discussion-based, encouraging the kids to talk more. The questions are open-ended rather than requiring single-word answers or short phrases. Who'd have thought it, wanting kids to talk more in lessons?

Tuesday, 27 September 2011

It's well known that some noun phrases that are grammatically singular but semantically plural (like the government, the staff, the band) can occur as the subject of either a plural or a singular verb form (with pronouns obligatorily matching the verb in number):

The government has said it will cut taxes.The government have said they will cut taxes.

*The government has said they will cut taxes.*The government have said it will cut taxes.

This is often said to be a US/UK thing, although you do hear both on both sides of the Atlantic. I noticed a restriction which I'm not sure I've seen discussed before, and that's how it works when you have a demonstrative (or a pseudo-demonstrative, a term I've just made up, for when a word such as this is used in a non-deictic way, as in "So I saw this band, called The Semantic Plurals, last night").

I think, and this is true for me though it may not be for everyone, that if you have a singular demonstrative determiner (this rather than plural these), you can't have a plural verb, it has to be singular to match the determiner:

??This band are going to be playing.This band is going to be playing.

But even more interestingly, you just can't have a plural demonstrative determiner - it's far worse:

*These band are going to be playing.*These band is going to be playing.

So the semantic plurality of a noun can influence number on the verb, but not the determiner - the determiner has to match the grammatical number of the noun. This is presumably because the features percolate upwards and you'd have a clash at DP level if they didn't match.

Monday, 26 September 2011

There's a really great project going on which you can help with. Here's the text from the website telling you what it's about:

For classics scholars, the vast number of damaged and fragmentary texts from the waste dumps of Greco-Roman Egypt has resulted in a difficult and time-consuming endeavor, with each manuscript requiring a character-by-character transcription. Words are gradually identified based on the transcribed characters and the manuscripts' linguistic characteristics. Both the discovery of new literary texts and the identification of known ones are then based on this analysis in relation to the established canon of extant Greek literature and its lexicons. Documentary texts, letters, receipts, and private accounts, are similarly assessed and identified through key terms and names. Furthermore, an immense number of detached fragments still linger, waiting to be joined with others to form a once intact text of ancient thought, both known and unknown. The data not only continues to reevaluate and assess the literature and knowledge of ancient Greece, but also illuminates the lives and culture of the multi-ethnic society of Greco-Roman Egypt.The data gathered by Ancient Lives will allow us to increase the momentum by which scholars have traditionally studied the collection. After transcriptions have been collected digitally, we can combine human and computer intelligence to identify known texts and documents faster than ever before. For unknown documents, we can isolate them and begin the long process of identification.Like any other scientific project, the data will require a lengthy process of vetting and analysis. There are no quick answers or discoveries. We want to make sure our findings are accurate. However, instead of just a few scholars going through the collection one fragment at a time, users of Ancient Lives are allowing professionals to process large batches of data at any given time. These papyri, as owned and overseen by the Egypt Exploration Society, will then be published and numbered in the Society’s Greco-Roman Memoirs series in the volumes entitled THE OXYRHYNCHUS PAPYRI.

They're just getting lots and lots of people to transcribe the hard-to-read texts into digital text, so that they can read them much more quickly. And it doesn't matter if a few get it wrong, because there'll be enough that those are easily spotted and ignored. This is a brilliant use of crowd-sourcing for research purposes.

Thursday, 22 September 2011

I always learn some new vocabulary when I'm at LAGB. Sometimes from the language tutorial (every year there's an in-depth look at an unfamiliar language), sometimes just from examples in the papers. Last year, for instance, I learnt that Swahili for lion is simba, presumably where the lovable yet headstrong (and somewhat dim) character in The Lion King gets his name.

This year I learnt that Turkish for man is adam. It's also the same in Hebrew, I think, as the name of Adam (the Biblical one) is supposed to be from Hebrew. People seem to disagree over what it means though - it also means red, like the earth (which Adam was supposedly made from). However, this new word is not surprising, only mildly interesting.

I also learnt the Tundra Nenets word for bread, which is na'an. Tundra Nenets is a Samoyedic Uralic language spoken in Russia. We get the word naan from Urdu (or Persian, according to the OED), which is a whole lot different from Nenets. I don't think there's been a lot of contact between northern Russian peoples and Urdu speakers, so what the heck is going on here? Could it be just coincidence?

Wednesday, 21 September 2011

There's a new book out which I haven't read yet. However, that never stopped anyone posting an Amazon review, so I'll throw my thoughts into the pot. It's called Is that a fish in your ear?: Translation and the meaning of everything, by David Bellos. His son Alex wrote a book called Numberland, which I also haven't read but is always on Waterstone's featured displays.

I've got another book called The meaning of everything(which is excellent, by the way - by Simon Winchester, about the Oxford English Dictionary), so no points for the sub-title. Points for the title though, which references the Babelfish from Douglas Adams' Hitchhiker's guide to the galaxy.

There was an extract of this book featured in the Independent the other day, describing how Google Translate works. Google Translate is a much-mocked tool, and originally rightly so. It could be relied upon to give you absolute garbage, no matter what you put into it. Hours of fun could be had translating text from one language to another and back again, and sniggering at the Chinese whispers result. Even better fun if you put it through more than one language on the way. These days, however, Google translate is disappointingly good. It gets translations pretty much completely accurate most of the time (NB It still should NOT be used to translate if you don't know the output language - you cannot guarantee it isn't utter nonsense).

The section featured in the Independent describes how it works. Here's an extract from the extract:

In fact, at bottom, it doesn't deal with meaning at all. Instead of taking a linguistic expression as something that requires decoding, Google Translate (GT) takes it as something that has probably been said before.

The corpus it can scan includes all the paper put out since 1957 by the EU in two dozen languages, everything the UN and its agencies have ever done in writing in six official languages, and huge amounts of other material, from the records of international tribunals to company reports and all the articles and books in bilingual form that have been put up on the web by individuals, libraries, booksellers, authors and academic departments.

It uses vast computing power to scour the internet in the blink of an eye, looking for the expression in some text that exists alongside its paired translation. Drawing on the already established patterns of matches between these millions of paired documents, Google Translate uses statistical methods to pick out the most probable acceptable version of what's been submitted to it.

This is fascinating, and obviously a good way to do it. After all, people do speak and write in fairly formulaic chunks a lot of the time. It's an efficiency device, so that we don't have to create new expressions from scratch all the time. This is why you get annoying cliches like at the end of the day and in any way, shape or form. It's also why you have standard greetings (how's it going) and ways of expressing yourself like I'm so sick of (X).

And as the author points out, human translators basically work this way too: they can often pre-empt the person they're translating and guess what will come next, based on frequently-used expressions. But this way of translating assumes that everything we say or write (or almost everything) has been said before. One of the first things we tell beginning linguistics students is that we can come up with a completely new sentence, that's never been uttered before, and any speaker of English can understand it. The standard practice is then to come up with some ridiculous sentence, like All of my armadillos have been put through too hot a wash and have shrunk.

I suppose that, faced with this sentence, Translate would take its constituent parts and translate them. So, for instance, it might find the string too hot a wash, or even have been put through too hot a wash, paired with a translation, somewhere in its corpus.

In fact, I just tried it and it didn't fare so well. I put it through an English-French-English process and it came back with this translation:

All my tattoos have been too hot to wash one and have narrowed

If you fiddle with the alternate translations you get there eventually, though I'm not sure how idiomatic it is. Ah well. There's jobs for human translators yet.

Monday, 19 September 2011

The printer in my department at university can be set to print on both sides of the paper. If you choose this option, a message appears on the little screen instructing you:

Do not grab paper until printing is complete.

Grab is a word you don't see written down that often. It has a very specific meaning, involving a hasty movement and perhaps negative connotations. Normally, you're being told not to grab (usually if you're a child), or to grab something quickly with the implication that it's a slightly naughty thing to do. Imagine you're at a buffet, and there are chocolate brownies at the end of the table. You're still getting sandwiches, but you say I'm going to grab a brownie while they're still there. You shouldn't really, because you're effectively queue-jumping, so it's a very slight misdemeanour.

In this case, the verb is very appropriate. It is a bad thing to grab the paper too soon, as you will spoil the printing and possibly damage the printer. But also, grabbing is exactly what you would have to do. They could have said Do not take paper until printing is complete, but by using grab, a very clear image of quickly reaching in and trying to take a bit of moving paper is conjured up, and the repercussions of said action are also brought to mind.

Saturday, 17 September 2011

I learnt recently that the word quasar is a portmanteau from quasi-stellar. A portmanteau is a word made from bits of two other words, the canonical example being brunch (from breakfast and lunch). We have lots of them, and they tend to be a bit silly, like spork or celebrity nicknames like Jedward. Interestingly, Tanzania is a portmanteau, from Tanganyika and Zanzibar, the two countries that made up the new independent republic.

What struck me about quasar is that it's a noun (it's a very energetic and distant active galactic nucleus, according to Wikipedia), but it's formed from an adjective (quasi-stellar). Looking into it, it's quickly obvious that it's because quasi-stellar is only part of a longer noun phrase, quasi-stellar radio source. Presumably, bods at quasar research facilities referred to them first by their full name, then pretty quickly shortened it to quasi-stellar (the radio source part can be assumed, after all, when what you do is look at radio sources) and then that was the part that got blended.

Thursday, 15 September 2011

Just because it's always fun (and easy) to poke fun at Mr Brown and his use (some might say abuse) of language, here's an oldish link to Tom Chivers' favourite 20 sentences from Brown's five books. Some choice examples (the sarky comments are Chivers'):

The Da Vinci Code, chapter 4:As a boy, Langdon had fallen down an abandoned well shaft and almost died treading water in the narrow space for hours before being rescued. Since then, he'd suffered a haunting phobia of enclosed spaces - elevators, subways, squash courts.Other enclosed spaces include toilet cubicles, phone boxes and dog kennels.

The Da Vinci Code, chapter 5: Only those with a keen eye would notice his 14-karat gold bishop's ring with purple amethyst, large diamonds, and hand-tooled mitre-crozier appliqué.A keen eye indeed.

The Lost Symbol, chapter 1:He was sitting all alone in the enormous cabin of a Falcon 2000EX corporate jet as it bounced its way through turbulence. In the background, the dual Pratt & Whitney engines hummed evenly.The Da Vinci Code, chapter 17: Yanking his Manurhin MR-93 revolver from his shoulder holster, the captain dashed out of the office.Oh – the Falcon 2000EX with the Pratt & Whitneys? And the Manurhin MR-93? Not the MR-92? You’re sure? Thanks.

That last one is my particular bugbear. I once had to read a book for review that EVERY time a car, helicopter, gun, or any weapon or vehicle was mentioned, gave its precise make and model number. It was not only annoying and unnecessary, but also prevented me from following the story as I wasn't sure if the baddies were arriving in a helicopter or a car, or even if they had a gun or a helicopter some of the time.

(Disclaimer: I have read most of Dan Brown's books. They are as awful as people say, but they are also as compelling as people say. Seriously, you have to find out what ridiculous twist is going to happen next and what nonsensical plot device he's going to employ to get his character out of whatever pickle he's in. My favourite was when he had his hero survive a jump from a helicopter with no parachute. I won't spoil it, but suffice it to say he was lucky he was carrying his pocket handkerchief.)

Wednesday, 14 September 2011

One of the people I met at LAGB had learnt Na'vi, the fictional language of the film Avatar. I haven't seen the film, because it looks abysmal, but I have read about the language. The director, James Cameron (yes, he of Terminator, Aliens, The Abyss and Titanic - you get the idea) wanted a realistic-sounding language for his aliens to speak. And, almost unbelievably, he very nearly managed to ask a linguist. The creator of Na'vi, Paul Frommer, is not actually working as a linguist, but he did do a doctorate in linguistics so kudos to Cameron, I suppose.

Tuesday, 13 September 2011

I was away most of last week at the annual Linguistics Association of Great Britain meeting (held in Manchester this year). I had a great time, catching up with friends, meeting new ones, hearing some really cool papers and organising my pecha kucha night (which went really well, as good as I could have hoped for, even though there were few participants). My presentation is here.

Sunday, 11 September 2011

Here my linguistics and jewellery interests meet in the centre of the Venn diagram of my brain. This Etsy seller will make a necklace of an IPA symbol (your initial, or just the one you like best). Shame my IPA initials are the same as my orthographic initials.

Wednesday, 7 September 2011

Tomorrow (or maybe today, I don't know how the time difference works) the University of Melbourne is holding the finals of its Three Minute Thesis (3MT) competition. It seems to be a university-wide competition, with a prize for the overall winner.

It's a great idea, as you do need a three-minute summary of your thesis - for your viva, for your 'elevator pitch' and for those people who say 'so, what are you working on?' (actually, 3 minutes is too long for some of them). And the idea of making it a competition is really fun, because it means you have to work extra hard to make it interesting. So many of us either forget that the other person doesn't care, and go on at length about the details which are fascinating to us but meaningless to them, or conversely, apologetically say 'oh, it's really boring' (guilty). Having to make it interesting to everyone is a challenge, but one well worth taking on.

Tuesday, 6 September 2011

I'm at LAGB from tomorrow until Saturday. I'm not presenting in the main session but I'm hosting a pecha kucha night on Friday, which I am giving a pecha kucha in. More on that later, and I'll post a link to my talk.

For now, here's a really interesting-looking research project you can take part in. It says in the blurb that it's

a research project looking into the way in which people associate different vowel sounds with different colours, and whether accent has any influence on this association.

and that it takes 10-15 minutes to complete. More information at the project site.

Monday, 5 September 2011

There's a new film out called Anonymous, which puts forward the well-discussed theory that Shakespeare's plays were actually written by Edward de Vere, 17th Earl of Oxford (the so-called Oxfordian theory). This is not a widely-accepted theory, by the way, and the film has caused a bit of a ruckus among people that care about this kind of thing.

Saturday, 3 September 2011

We were having a conversation about linguistic sci-fi the other night in the pub (best place for it, I find). I love sci-fi, and I love linguistics, and I once read a list by Geoff Pullum (in the collection of his columns, 'The great Eskimo vocabulary hoax') of books that combine the two.

To count as linguistic sci-fi, language/linguistics has to be relevant to the story, in my opinion. It's not enough that they just speak a different language, like Klingon in Star Trek. Although there is some linguistic aspect, as Klingon is obviously supposed to sound aggressive, you couldn't say that Star Trek is linguistic sci-fi just because of that. To count, the language has to be central or at least important to the plot.

The books on Pullum's list turned out to be surprisingly hard to get, but this is one I did find on Amazon:
It wasn't that great, to be honest. The basic idea is that someone wants to control the people of Pao, who are docile and peaceful (well, not warlike anyway). To control them and make them a bit more fighty, the scientists at another planet, Breakness, have devised a new language which will allow them (force them?) to be less placid. It's obviously all very Sapir-Whorfian, and an interesting idea. It's unfortunately not really my kind of book, being the kind where the characters must flee things and avenge people and wear big weapons and so on.

This link is exactly that - a list of colour idioms, arranged by language and then by colour. A few (some of the French ones) we've borrowed, but mostly these are unfamiliar to me and not the same in English (i.e. idiomatic).