I’ve been putting this post off for a while for a couple of reasons: first, I was a little burned out and was enjoying not thinking about my thesis for a while, and second, I wasn’t sure how to tackle this post. My thesis is about eighty pages long all told, and I wasn’t sure how to reduce it to a manageable length. But enough procrastinating.

The basic idea of my thesis was to see which usage changes editors are enforcing in print and thus infer what kind of role they’re playing in standardizing (specifically codifying) usage in Standard Written English. Standard English is apparently pretty difficult to define precisely, but most discussions of it say that it’s the language of educated speakers and writers, that it’s more formal, and that it achieves greater uniformity by limiting or regulating the variation found in regional dialects. Very few writers, however, consider the role that copy editors play in defining and enforcing Standard English, and what I could find was mostly speculative or anecdotal. That’s the gap my research aimed to fill, and my hunch was that editors were not merely policing errors but were actively introducing changes to Standard English that set it apart from other forms of the language.

Some of you may remember that I solicited help with my research a couple of years ago. I had collected about two dozen manuscripts edited by student interns and then reviewed by professionals, and I wanted to increase and improve my sample size. Between the intern and volunteer edits, I had about 220,000 words of copy-edited text. Tabulating the grammar and usage changes took a very long time, and the results weren’t as impressive as I’d hoped they’d be. There were still some clear patterns, though, and I believe they confirmed my basic idea.

The most popular usage changes were standardizing the genitive form of names ending in -s (Jones’>Jones’s), which>that, towards>toward, moving only, and increasing parallelism. These changes were not only numerically the most popular, but they were edited at fairly high rates—up to 80 percent. That is, if towards appeared ten times, it was changed to toward eight times. The interesting thing about most of these is that they’re relatively recent inventions of usage writers. I’ve already written about which hunting on this blog, and I recently wrote about towards for Visual Thesaurus.

In both cases, the rule was invented not to halt language change, but to reduce variation. For example, in unedited writing, English speakers use towards and toward with roughly equal frequency; in edited writing, toward outnumbers towards 10 to 1. With editors enforcing the rule in writing, the rule quickly becomes circular—you should use toward because it’s the norm in Standard (American) English. Garner used a similarly circular defense of the that/which rule in this New York Times Room for Debate piece with Robert Lane Greene:

But my basic point stands: In American English from circa 1930 on, “that” has been overwhelmingly restrictive and “which” overwhelmingly nonrestrictive. Strunk, White and other guidebook writers have good reasons for their recommendation to keep them distinct — and the actual practice of edited American English bears this out.

He’s certainly correct in saying that since 1930 or so, editors have been changing restrictive which to that. But this isn’t evidence that there’s a good reason for the recommendation; it’s only evidence that editors believe there’s a good reason.

What is interesting is that usage writers frequently invoke Standard English in defense of the rules, saying that you should change towards to toward or which to that because the proscribed forms aren’t acceptable in Standard English. But if Standard English is the formal, nonregional language of educated speakers and writers, then how can we say that towards or restrictive which are nonstandard? What I realized is this: part of the problem with defining Standard English is that we’re talking about two similar but distinct things—the usage of educated speakers, and the edited usage of those speakers. But because of the very nature of copy editing, we conflate the two. Editing is supposed to be invisible, so we don’t know whether what we’re seeing is the author’s or the editor’s.

Arguments about proper usage become confused because the two sides are talking past each other using the same term. Usage writers, editors, and others see linguists as the enemies of Standard (Edited) English because they see them tearing down the rules that define it, setting it apart from educated but unedited usage, like that/which and toward/towards. Linguists, on the other hand, see these invented rules as being unnecessarily imposed on people who already use Standard English, and they question the motives of those who create and enforce the rules. In essence, Standard English arises from the usage of educated speakers and writers, while Standard Edited English adds many more regulative rules from the prescriptive tradition.

My findings have some serious implications for the use of corpora to study usage. Corpus linguistics has done much to clarify questions of what’s standard, but the results can still be misleading. With corpora, we can separate many usage myths and superstitions from actual edited usage, but we can’t separate edited usage from simple educated usage. We look at corpora of edited writing and think that we’re researching Standard English, but we’re unwittingly researching Standard Edited English.

None of this is to say that all editing is pointless, or that all usage rules are unnecessary inventions, or that there’s no such thing as error because educated speakers don’t make mistakes. But I think it’s important to differentiate between true mistakes and forms that have simply been proscribed by grammarians and editors. I don’t believe that towards and restrictive which can rightly be called errors, and I think it’s even a stretch to call them stylistically bad. I’m open to the possibility that it’s okay or even desirable to engineer some language changes, but I’m unconvinced that either of the rules proscribing these is necessary, especially when the arguments for them are so circular. At the very least, rules like this serve to signal to readers that they are reading Standard Edited English. They are a mark of attention to detail, even if the details in question are irrelevant. The fact that someone paid attention to them is perhaps what is most important.

And now, if you haven’t had enough, you can go ahead and read the whole thesis here.

I did a project on “comprised of” for my class last semester on historical changes in American English, and even though I knew it was becoming increasingly common even in edited writing, I was still surprised to see the numbers. For those unfamiliar with the rule, it’s actually pretty simple: the whole comprises the parts, and the parts compose the whole. This makes the two words reciprocal antonyms, meaning that they describe opposite sides of a relationship, like buy/sell or teach/learn. Another way to look at it is that comprise essentially means “to be composed of,” while “compose” means “to be comprised in” (note: in, not of). But increasingly, comprise is being used not as an antonym for compose, but as a synonym.

It’s not hard to see why it’s happened. They’re extremely similar in sound, and each is equivalent to the passive form of the other. When “comprises” means the same thing as “is composed of,” it’s almost inevitable that some people are going to conflate the two and produce “is comprised of.” According to the rule, any instance of “comprised of” is an error that should probably be replaced with “composed of.” Regardless of the rule, this usage has risen sharply in recent decades, though it’s still dwarfed by “composed of.” (Though “composed of” appears to be in serious decline. I have no idea why). The following chart shows its frequency in COHA and the Google Books Corpus.

Though it still looks pretty small on the chart, “comprised of” now occurs anywhere from 21 percent as often as “composed of” (in magazines) to a whopping 63 percent as often (in speech) according to COCA. (It’s worth noting, of course, that the speech genre in COCA is composed of a lot of news and radio show transcripts, so even though it’s unscripted, it’s not exactly reflective of typical speech.)

What I find most striking about this graph is the frequency of “comprised of” in academic writing. It is often held that standard English is the variety of English used by the educated elite, especially in writing. In this case, though, academics are leading the charge in the spread of a nonstandard usage. Like it or not, it’s becoming increasingly more common, and the prestige lent to it by its academic feel is certainly a factor.

But it’s not just “comprised of” that’s the problem; remember that the whole comprises the parts, which means that comprise should be used with singular subjects and plural objects (or multiple subjects with multiple respective objects, as in The fifty states comprise some 3,143 counties; each individual state comprises many counties). So according to the rule, not only is The United States is comprised of fifty states an error, but so is The fifty states comprise the United States.

It can start to get fuzzy, though, when either the subject or the object is a mass or collective noun, as in “youngsters comprise 17% of the continent’s workforce,” to take an example from Mark Davies’ COCA. This kind of error may be harder to catch, because the relationship between parts and whole is a little more abstract.

And with all the data above, it’s important to remember that we’re seeing things that have made it into print. As I said above, many editors have to look up the rule every time they encounter a form of “comprise” in print, meaning that they’re more liable to make mistakes. It’s possible that many more editors don’t even know that there is a rule, and so they read past it without a second thought.

Personally, I gave up on the rule a few years ago when one day it struck me that I couldn’t recall the last time I’d seen it used correctly in my editing. It’s never truly ambiguous (though if you can find an ambiguous example that doesn’t require willful misreading, please share), and it’s safe to assume that if nearly all of our authors who use comprise do so incorrectly, then most of our readers probably won’t notice, because they think that’s the correct usage.

And who’s to say it isn’t correct now? When it’s used so frequently, especially by highly literate and highly educated writers and speakers, I think you have to recognize that the rule has changed. To insist that it’s always an error, no matter how many people use it, is to deny the facts of usage. Good usage has to have some basis in reality; it can’t be grounded only in the ipse dixits of self-styled usage authorities.

And of course, it’s worth noting that the “traditional” meaning of comprise is really just one in a long series of loosely related meanings the word has had since it was first borrowed into English from French in the 1400s, including “to seize,” “to perceive or comprehend,” “to bring together,” and “to hold.” Perhaps the new meaning of “compose” (which in reality is over two hundred years old at this point) is just another step in the evolution of the word.

The other day on Twitter, Bryan A. Garnerposted, “May I ask a favor? Would all who read this please use the prep. ‘till’ in a tweet? Not till then will we start getting people used to it.” I didn’t help out, partly because I hate pleas of the “Repost this if you agree!” variety and partly because I knew it would be merely a symbolic gesture. Even if all of Garner’s followers and all of their followers used “till” in a tweet, it wouldn’t even be a blip on the radar of usage.

But it did get me thinking about the word till and the fact that a lot of people seem to regard it as incorrect and forms like 'til as correct. The assumption for many people seems to be that it’s a shortened form of until, so it requires an apostrophe to signal the omission. Traditionalists, however, know that although the two words are related, till actually came first, appearing in the language about four hundred years before until.

Both words came into English via Old Norse, where the preposition til had replaced the preposition to. (As I understand it, modern-day North Germanic languages like Swedish and Danish still use it this way.) Despite their similar appearances, to and till are not related; till comes from a different root meaning ‘end’ or ‘goal’ (compare modern German Ziel ‘goal’). Norse settlers brought the word til with them when they started raiding and colonizing northeastern Britain in the 800s.

There was also a compound form, until, from und + til. Und was another Old Norse preposition deriving from the noun und, which is cognate with the English word end. Till and until have been more or less synonymous throughout their history in English, despite their slightly different forms. And as a result of the haphazard process of spelling standardization in English, we ended up with two ls on till but only one on until. The apostrophized form 'til is an occasional variant that shows up far more in unedited than edited writing. Interestingly, the OED’s first citation for 'til comes from P. G. Perrin’s An Index to English in 1939: “Till, until, (’til), these three words are not distinguishable in meaning. Since ’til in speech sounds the same as till and looks slightly odd on paper, it may well be abandoned.”

Mark Davies’ Corpus of Historical American English, however, tells a slightly different story. It shows a slight increase in 'til since the mid-twentieth century, though it has been declining again slightly in the last thirty years. And keep in mind that these numbers come from a corpus of edited writing drawn from books, magazines, and newspapers. It may well be increasing much faster in unedited writing, with only the efforts of copy editors keeping it (mostly) out of print. This chart shows the relative proportions of the three forms—that is, the proportion of each compared to the total of all three.

As Garner laments, till is becoming less and less common in writing and may all but disappear within the next century, though predicting the future of usage is always a guessing game, even with clear trends like this. Sometimes they spontaneously reverse, and it’s often not clear why. But why is till in decline? I honestly don’t know for sure, but I suspect it stems from either the idea that longer words are more formal or the perception that it’s a shortened form of until. Contractions and clipped forms are generally avoided in formal writing, so this could be driving till out of use.

Note that we don’t have this problem with to and unto, probably because to is one of the most common words in the language, occurring about 9,000 times per million words in the last decade in COHA. By comparison, unto occurs just under 70 times per million words. There’s no uncertainty or confusion about the use of spelling of to. We tend to be less sure of the meanings and spellings of less frequent words, and this uncertainty can lead to avoidance. If you don’t know which form is right, it’s easy to just not use it.

At any rate, many people are definitely unfamiliar with till and may well think that the correct form is 'til, as Gabe Doyle of Motivated Grammar did in this post four years ago, though he checked his facts and found that his original hunch was wrong.

He’s far from the only person who thought that 'til was correct. When my then-fiancee and I got our wedding announcements printed over eight years ago, the printer asked us if we really wanted “till” instead of “'til” (“from six till eight that evening”). I told him that yes, it was right, and he kind of shrugged and dropped the point, though I got the feeling he still thought I was wrong. He probably didn’t want to annoy a paying customer, though.

And though this is anecdotal and possibly falls prey to the recency illusion, it seems that 'til is on the rise in signage (frequently as ‘til, with a single opening quotation mark rather than an apostrophe), and I even spotted a til' the other day. (I wish I’d thought to get a picture of it.)

I think the evidence is pretty clear that, barring some amazing turnaround, till is dying. It’s showing up less in print, where it’s mostly been replaced by until, and the traditionally incorrect 'til may be hastening its death as people become unsure of which form is correct or even become convinced that till is wrong and 'til is right. I’ll keep using till myself, but I’m not holding out hope for a revival. Sorry, Garner.

Last week’s earthquake in northern Japan reminded me of an interesting pet peeve of a friend of mine: she hates the word temblor. Before she brought it to my attention, it had never really occurred to me to be bothered by it, but now I can’t help but notice it and be annoyed anytime there’s a news story about an earthquake. Her complaint is that it’s basically a made-up word that only journalists use, and it seems she’s essentially right.

A quick search on Mark Davies’ Corpus of Contemporary American English shows that temblor occurs just over twice as often in newspaper writing as in magazine writing, and more than three times as frequently in newspaper writing as in fiction. It’s effectively nonexistent in academic writing—the only two hits in COCA are actually in Spanish contexts, as are three of the hits under fiction. It’s also worth noting that all of the spoken examples are from news programs. The following chart shows its frequency per million words.

So what’s to explain the strange distribution of this word? I strongly suspect it’s the doing of what John E. McIntyre calls “the dear old, so frequently misguided, Associated Press Stylebook.” I only have a copy of the 2004 edition (which my wife picked up at a yard sale for 50 cents—don’t worry, I wouldn’t waste good money on it), but the entry for temblor (yes, there’s actually an entry for it) merely refers one to earthquakes. That entry goes on for a page and a half about earthquake magnitudes and notable earthquakes of the past before noting that “the word temblor (not tremblor) is a synonym for earthquake.”

I don’t understand why the AP Stylebook needs to point out spelling and synonymy—I thought those were the jobs of dictionaries and thesauruses, respectively—but I find it interesting that it doesn’t list any other synonyms. Thesaurus.com lists convulsion, fault, macroseism, microseism, movement, quake, quaker, seimicity, seism, seismism, shake, shock, slip, temblor, trembler, undulation, and upheaval, though obviously not all of these are equally acceptable synonyms.

So why does temblor get singled out? I honestly don’t know. I do know that journalists are fond of learning synonyms to avoid tiring out common words, and I know that at least some journalists take the practice to unreasonable levels, such as the teacher who made her students memorize 120 synonyms for said. Whatever the reason, journalists seem to have latched on to temblor, though few others outside the fields of newspaper and magazine writing have picked it up.

In the comments I mused that perhaps gray is only more common because of prescriptions like this one. John Cowan noted that gray is the main head word in Webster’s 1828 dictionary, with grey cross-referenced to it, saying, “So I think we can take it that “gray” has been the standard AmE spelling long before the AP stylebook, or indeed the AP, were in existence.”

But I don’t think Webster’s dictionary really proves that at all. When confronted with multiple spellings of a word, lexicographers must choose which one to include as the main entry in the dictionary. Webster’s choice of gray over grey may have been entirely arbitrary. Furthermore, considering that he was a crusader for spelling reform, I don’t think we can necessarily take the spellings in his dictionary as evidence of what was more common or standard in American English.

So I headed over to Mark Davies’ Corpus of Historical American English to do a little research. I searched for both gray and grey as adjectives and came up with this. The grey line represents the total number of tokens per million words for both forms.

Up until about the 1840s, gray and grey were about neck and neck. After that, gray really takes off while grey languishes. Now, I realize that this is a rather cursory survey of their historical distribution, and the earliest data in this corpus predates Webster’s dictionary by only a couple of decades. I don’t know how to explain the growth of gray/grey in the 1800s. But in spite of these problems, it appears that there are some very clear-cut trend lines—gray became overwhelmingly more common, but grey has severely diminished but not quite disappeared from American English.

This ties in nicely with a point I’ve made before: descriptivism and prescriptivism are not entirely separable, and there is considerable interplay between the two. It may be that Webster really was describing the linguistic scene as he saw it, choosing gray because he felt that it was more common, or it may be that his choice of gray was arbitrary or influenced by his personal preferences.

Either way, his decision to describe the word in a particular way apparently led to a prescriptive feedback loop: people chose to use the spelling gray because it was in the dictionary, reinforcing its position as the main entry in the dictionary and leading to its ascendancy over grey and eventually to the AP Stylebook‘s tweet about its preferred status. What may have started as a value-neutral decision by Webster about an utterly inconsequential issue of spelling variability has become an imperative to editors . . . about what is still an utterly inconsequential issue of spelling variability.

My sister-in-law will soon graduate from high school, and we recently got her graduation announcement in the mail. It was pretty standard stuff—a script font in metallic ink on nice paper—but one small detail caught my eye. It says the commencement exercises will take place at “ten-thirty o’clock.” As far as I can remember, I’ve never before heard a rule against using “o’clock” with times other than the hour, but it struck me as wrong.

I checked Merriam-Webster first, but it was no help; all it says is “according to the clock,” though its example sentence is “the time is three o’clock.” I then pulled out my copy of Merriam-Webster’s Dictionary of English Usage, but it didn’t even have an entry for o’clock or clock. So then, because my wife was on the computer and I couldn’t access the OED online, I pulled out my compact OED and magnifying glass to see if it had anything to say.

Once I had flipped to the entry and scanned through the minuscule type, I found this one line: “The hour of the day is expressed by a cardinal numeral, followed by a phrase which was originally of the clock, now only retained in formal phraseology; shortened subsequently to . . . o’clock.” The citations begin with Chaucer and continue up to modern English.

And then, out of curiosity, I checked the Corpus of Contemporary American English, but I couldn’t find any examples of x:30 o’clock. Google, however, turned up plenty of examples, including a thread on Amazon’s Askville asking why you can’t say “11:30 o’clock.” The best explanation there seems to be that since the clock hands aren’t pointing at a specific hour, it can’t be anything-o’clock.

This answer doesn’t seem quite satisfying to me—it doesn’t explain why the hour hand has to be pointing directly at a number or why the minute hand doesn’t matter. But then I remembered that clock originally meant “bell” and that early clocks chimed on the hour (well, I suppose some modern clocks do too, but you see where I’m going). Early mechanical clocks were rather large, and most people measured time not by checking the clock face to see where the hands were, but by counting the number of chimes on the hour. So I would assume that this is why it sounds strange to use “o’clock” with fractions of hours. Thoughts, anyone?

The other day at work I came across a strange construction: an author had used “not surprising” as a sentence adverb, as in “Not surprising, the data show that. . . .” I assumed it was simply an error, so I changed it to “not surprisingly” and went on. But then I saw the same construction again. And again. And then I saw a similar construction (“Quite possible, yada yada yada”) within a quotation within the article, at which point I really started to feel weirded out.

I checked the source of the quote, and it turned out that it was actually a grammatically normal “Quite possibly” that the author of the article I was editing had accidentally changed (or intentionally fixed?). My suspicion was that the author was extending the pseudo-rule against the sentence adverb more importantly and was thus avoiding sentence adverbs more generally.

This particular article is for inclusion in a sociology book, so I thought that perhaps there was a broader rule against sentence adverbs in the APA style guide. I didn’t find any such rule there, but I did find something interesting when I did a search on the string “. Not surprising,” in the Corpus of Contemporary American English and found sixteen relevant hits. All the hits appeared to occur in social science or journalistic works, ranging from the New York Times to PBS New Hour to the journal Military History. A similar search for the string “. Not surprisingly,” returned over 1200 hits. (I did not bother to sort through these to determine their relevancy.)

I’m not quite sure what’s going on here. As I said above, the only explanation I can come up with is that someone has extended the rule against more importantly or perhaps other sentence adverbs like hopefully that don’t modify anything in the sentence. Not that the sentence adjective version modifies anything either, of course, but that’s a different issue.

If anyone has any alternative explanation for or justification of this construction, I’d be interested to hear it. It still strikes me as a rather awkward bit of English.

“This is the type of arrant pedantry up with which I will not put.” —not Churchill, but maybe some anonymous government official, or maybe no one at all