Automatically remove wordiness from your writing

I recently started re-reading William Zinsser's
On Writing Well. Zinsser emphasizes simplicity in writing. To reduce wordiness, he implores the writer to remove needless words and phrases:

"I might add," "It should be pointed out," "It is interesting to note that" how many sentences begin with these dreary clauses announcing what the writer is going to do next? If you might add, add it. If it should be pointed out, point it out. If it is interesting to note, make it interesting. Being told that something is interesting is the surest way of tempting the reader to find it dull; are we not all stupefied by what follows when someone says, "This will interest you"? As for the inflated prepositions and conjunctions, they are the innumerable phrases like "with the possible exception of" (except), "due to the fact that" (because), "he totally lacked the ability to" (he couldn't), "until such time as" (until), "for the purpose of" (for).

It's not only dry corporation speak that you should worry about. Actually, what I mean to say is that a little bit of wordiness totally creeps into informal writing way more than you'd think. If you do any sort of writing on the web, you seriously need to think about editing, and more often than not, this tool can help point out some bad habits.

You might be concerned that your writing will loose its personality. Zinsser goes on to say:

You will reach for gaudy similes and tinseled adjectives, as if "style" were something you could buy at a style store and drape onto your words in bright decorator colors. (Decorator colors are the colors that decorators come in.) Resist this shopping expedition: there is no style store. ... Style is organic to the person doing the writing, as much a part of him as his hair, or, if he is bald, his lack of it. Trying to add style is like adding a toupee.

You don't want your blog to wear a toupee, do you? Writing style isn't about needless words. Once you remove them, your thoughts will shine through, clearer and more powerful, and then you can then build them back up. This takes time, but your readers will appreciate it.

By using sources on the web, I came up with about 600 simple substitution rules to cut out wordy phrases, and encoded them into a python script.
Along with other sources, I used Jeff Atwood's Coding Horror blog to train it, [edit] as he seems to have a high wordiness factor, because I wondered if I could get a web celebrity to notice my little blog, and it totally worked.

I have frequently demonstrated my verbal communication skills throughout my career. Most notably, during my degree, I delivered a presentation on Molecular Clocks to my fellow students. This presentation depended heavily on my verbal communication skills as the subject area was very unfamiliar to my audience. This presentation was truly a success, as well as my group leader praising my work, many of my fellow students mentioned that my presentation clarified the theory (behind molecular clocks) for them.

I typed in this phrase "I wish to express my heartfelt desire to perform the act of coitus with that extremely beautiful and long-legged woman." Guess what happened? nothing. You as one man can not possibly code a program to remove wordiness. Especially one that can be loaded on a web page that I dont care enough about to see if its HTML or CSS. Whatever. You'll probably point out somewhere where I misread the purpose of this utility, but it seems fairly straight-forward.

I don't like that it rewrites your text. You have to diff the result yourself, especially since its simple algorithm mangles some things, like your first paragraph. Look at the `style' program that comes with many *nix distros for inspiration. I even have an Emacs mode that highlights words for me that style doesn't like. I decide on my own what to change.

I think we are moving much too far in the direction of simplification. One of the reasons many older works, both of fiction and non-fiction, are more enjoyable to read for many of us is that the language tends to be much richer and sentences longer and their structure more complex. There are a few modern novelists who reject the notion that everything needs to be simplified and shortened and hence write wonderfully rich, engaging literature. I can understand shortening and simplifying business language, but that's about it.

I tried one of my blog entries and it did not remove even a single word! Here is the post:

Whenever you are discussing about doing something in a group, there is a question you should ask. It is the most important question. It will make or break your work.

It is NOT “Who all are interested in the idea?”

It is “Who all are not really interested?”

The people who are not interested in doing something great, people who are pissed off by the idea of being great, and lazy people, and great eaters, and great sleepers, and people who do not talk, can lower the morale of your group significantly, and eventually you will drop your great idea. Better, drop those people.

My approach only substitutes or eliminates words. It can't revise a lot of your text, because it goes beyond simple substitution. Only the author can change a sentence from the passive to the active voice. For example, to change "M was given", we need to decide who did the giving, and then write something like "The company gave M..."

First, let me review a bit of history from my (rather biased) perspective. M was given and doggedly maintained sole responsibility for the expenditures and accounting associated with the Rebuilding Project, including single signature responsibility. He assumed that his responsibility was fulfilled completely by reporting month after month that the Project was “on schedule and on budget”. The Finance Committee volunteered B, a trained accounting professional, to create and support the appropriate financial records. This offer was rejected. Instead, A was appointed by M to fulfill this role, despite his apparent lack of skills in this area. A year later, in A’s own words, there are “no records” (invoices, checks, etc.), which document and account for our activities. A even has this wrong; M working with C, our external accountant, has now completed reconciliation of the bank statements through 2007. However, we do lack the associated invoices.

If one apple costs $1, how much would five apples cost? How about 500? If everyday life, when you buy more of something, you get more bananas for your buck. But software companies are bucking the trend.

Back in 2007, I created a rhyming engine based on the public domain Moby pronouncing dictionary. It simply reads the dictionary and looks for rhyming words by comparing the suffix of the words' pronunciations. Since that time, I have made some improvements.