Thoughts on science, history and philosophy of science, atheism, religion, politics, the media, education, learning, books, films, and other fun stuff.

I am a theoretical physicist and currently Director of UCITE (University Center for Innovation in Teaching and Education) at Case Western Reserve University in Cleveland, Ohio. I am the author of three books: God vs. Darwin: The War Between Evolution and Creationism in the Classroom (2009), The Achievement Gap in US education: Canaries in the Mine (2005), and Quest for Truth: Scientific Progress and Religious Beliefs (2000). Disclaimer: The views expressed in this blog are my personal ones and are not those of UCITE or Case Western Reserve University. If you wish, you can send me an email by clicking here.

April 27, 2005

The reading level of this blog

I came across an interesting website recently. You type in the URL of any site and it comes back immediately with various measures of the site's readability, including the years of education necessary to understand it, its clarity, and so forth. It also provides comparisons on these indices with various standard media such as newspapers and magazines.

So naturally the first thing that I did was put in this blog's URL to see how I shaped up. Here is what I got:

Readability Results for http://blog.case.edu/mxs24
Average words per sentence 16.15
Words with 1 Syllable 3,230
Words with 2 Syllables 1,010
Words with 3 Syllables 561
Words with 4 or more Syllables 415
Percentage of word with three or more syllables 18.71%
Average Syllables per Word 1.65

That much was pretty straightforward. The other three numbers were more mysterious:
Gunning Fog Index 13.94
Flesch Reading Ease 51.07
Flesch-Kincaid Grade 10.15

The site helpfully explains that the Fog Index "is a rough measure of how many years of schooling it would take someone to understand the content. The lower the number, the more understandable the content will be to your visitors. Results over seventeen are reported as seventeen, where seventeen is considered post-graduate level." Looking at the algorithm, it seems to depend entirely on the number of words per sentence and the percentage of words that have three or more syllables.

So it takes about 14 years of education (or up to college sophomore level) for someone to understand the content of my website. So clearly I am not going to get huge market share with my blog.

For comparison, some Fog Index Scores are given for other publications:

6 TV guides, The Bible, Mark Twain
8 Reader's Digest
8 â€“ 10 Most popular novels
10 Time, Newsweek
11 Wall Street Journal
14 The Times, The Guardian
15 â€“ 20 Academic papers
Over 20 Only government sites can get away with this, because you can't ignore them.
Over 30 The government is covering something up

Since my Fog Index score is close to 15, it seems like it is hard for me to shake the habits of writing in the style of academic papers even in the more casual setting of a blog.

The Flesch Reading Ease number "rates the text on a 100-point scale. The higher the score, the easier it is to understand the document. Authors are encouraged to aim for a score of approximately 60 to 70." So I flunk this score pretty badly, it looks like. This algorithm, seems to depend entirely on the number of words per sentence and the average number of syllables per word.

The Flesch-Kincaid grade level, like the Gunning-Fog index, "is a rough measure of how many years of schooling it would take someone to understand the content. Negative results are reported as zero, and numbers over twelve are reported as twelve." This seems like the same measure as the Fog Index, but uses average number of syllables per word instead on percentage of words with three syllables or more.

What is one to make of things like this? I find them fun even if I don't take them too seriously. For one thing, you have to be skeptical of these instant computer-generated analyses of such complex things as writing. While these programs are great at doing numbers, one has to be wary of claims that they can accurately measure things like clarity and reading grade level. They all assume that the number of polysyllabic words and the length of sentences are the only factors, and that the nature of the content is immaterial.

This explains the results for the Bible, which had initially puzzled me. It is ranked together with TV Guide, although surely it is a more difficult book to understand. But it does use short words and sentences. This kind of algorithm also also might explain why the Wall Street Journal, which one might think is less readable than the New York Times, scores at three grades below it.

Suppose I want to become more easily readable. Should I use more words of one syllable? Or shorter sentences? Or both? Or is it the topics that cause the problem? When you write about academic topics, polysyllabic words (two already in this sentence!) creep in without any effort. Can I write about the Copernican Revolution (two more!) and avoid words like heliocentric (another one!)

To become more readable must I switch my focus from history and philosophy of science to Britney Spears? There are some prices that are too high to pay even for increased ease of readabilityâ€¦

Trackbacks

Comments

Okay, I'll delurk for a bit to say: What fascinating tripe! ;) One very important thing that this doesn't take into account is sentence construction - sentences with complex clause structure (e.g., relative clauses) are much harder to understand than simpler ones. I suppose they're using sentence length as a proxy for that, but I'm not sure it's sufficient. "The man hit by the car survived" will be more difficult for the average person than the longer sentence, "The man was hit by a car and survived."

Posted by Erin on April 27, 2005 09:16 AM

Erin makes a good point. These tests can't really account for the grammatical structure or the true nature of the vocabulary. I just ran a page I've been writing and came up with:

The copy is not overly complicated, but it is full of big words like "administration" which ought not tax the vocabulary of my intended audience. It also has compound sentences full of lists. Thus the paragraphs are bound to confound the grading process.

It's an interesting exercise nonetheless, and reminds me of something I was told back when I was working for a magazine. The editor said that most magazines tried to gear their readability to a level four years beneath the educational level of their intended audience. While the readership could understand more complicated prose, it was found that they preferred not to be overly challenged when reading for pleasure.

While it's true that sentence complexity is a major factor in reading difficulty, I don't think you can say that sentence length is only a factor insofar as increasing length suggests increasing complexity; after all, the longer a sentence is, even if it's structurally very simple, the more the reader has to keep in mind at the same time (until he reaches the end of the sentence).

Of course, there's more complexity to that than just sentence length - it's not always necessary to keep an entire sentence in one's head at once, and structure and punctuation can provide cues to the reader that some things can be stored "early" - but sentence length is a better measure of this than what we'd ordinarily think of as complexity of sentence structure.

Posted by Ran on April 28, 2005 09:12 PM

The Bible is supposedly easier to understand than this blog? Persons who cannot read anything much more complex than TV guides will certainly be unable to understand the writings of Samuel Clemens (Mark Twain); those who have a good command of English may be able to tackle the Bible. However, as the Bible was not written in English (the Jewish Bible was written mostly in Hebrew, with some Aramaic, while the Christian New Testament was written in Ancient Greek), and translations are generally not very good (as even the best translations lose some of the intrinsic meaning in their translations), the Bible in truth is closer to the level of academic papers than it is to difficult newspapers or this blog...

This is interesting, I am your regular writer, and this is what my blog scored

Total sentences 154
Total words 1533
Average words per Sentence 9.95
Words with 1 Syllable 1064
Words with 2 Syllables 282
Words with 3 Syllables 135
Words with 4 or more Syllables 52
Percentage of word with three or more syllables 12.20%
Average Syllables per Word 1.46

The question is, which translation of the Bible? King James Version is a lot harder to understand than, say, Contemporary English Version. I use Bible software to compare the various translations (http://www.logos.com) but what you notice when doing this is a lot of variety in the language level. It's all over the map.