6.09.2006

sentence complexity

let me briefly and profusely apologize for our extended absence. the last month or so has been filled with vacations, moving, job hunts, comments on my last post, and so on. enough with this 'sorry' blather! on to language...

speaking of jobs, i'm spending the summer as a research assistant in a cog sci lab. i'm helping write/assess a government adult literacy and vocabulary test. that's right! i'm one of those people that makes standardized tests! i think there was a point in my life when i thought it would be an interesting career... i wasn't totally wrong.

so far i've written 20 questions. each focuses on one word with multiple meanings (usually including more than one part of speech). i write three fill-in-the-blank sentences that would make sense with the target word in the blank, and then i write three 'matching' sentences that would NOT make sense with the target word in the blank. but in all cases, the wrong sentences have to match in missing part of speech and all the sentences have to use the same form of the word (that is, no plurals or past-tense or suchwhat).

now i'm collecting a wide range of statistics on each possible answer sentence so that we can make sure, for example, that the wrong answers aren't systematically different from the right answers.

i'm also taking a list of statistics about what i'm calling the 'syntactic complexity' of each sentence. these include number of prepositional phrases, number of strings of modifiers, number of non-main clauses and so on. it would be really interesting if any of these statistics ended up correlating with the response time we will empirically collect for the sentences.

interestingly, since the standard readability tests are partially based on the average number of syllables per word, lots of grammatical words (prepositions, quantifiers, conjunctions, etc) tend to yield low Flesch-Kincaid grade levels, but tend to make my 'syntactic complexity' score very high. this type of complexity also correlates better with sentence length than the readability stats do, since you can't have a syntactically complex sentence that's only four words.

no real conclusions to draw yet, except to say that a lot of these grammatical categories are tough to define, and sometimes come down to my research-assitant judgement.

if anyone out there in linguo-blog land knows of any official measure of syntactic complexity or any measure that's been used to predict reading times, send it on in!