Improve your writing with shell scripts

26 Jul 2010

Currently, I’m working on my Master’s thesis on Hidden Markov Models. Matt Might wrote an article on three shell scripts to improve your writing, which I found interesting. The scripts help to detect the use of passive voice, weasel words (such as “surprisingly low”) and duplicate words (which are difficult to detect when a line break separates them).

One of the remarks I repeatedly received was that my paragraphs were much too short. A couple of paragraphs were just one or two sentences, which I could usually just throw together.

In the light of the Matt’s scripts, I wrote my own version, in which I detect paragraphs with only two or three sentences and spanning only a few lines. It isn’t perfect and not all small paragraphs need to be long, but it might warrant a closer inspection of your text.

#!/usr/bin/env pythonimportsysimportreSINGLE_COMMAND_RE=re.compile(r'^\\\w+\{[^}]+\}$')defprocess(file):"""Ignores lines containing only a single command at the beginning of a paragraph (piece of text surrounded by blank lines)."""paragraph=[0,0,0]# [start, sentence count, linecount]prev_line=Noneforlinenum,lineinenumerate(file):line=line.strip()ifSINGLE_COMMAND_RE.match(line)andnotprev_line:continueifnotlineandparagraph[1]:report_short_paragraph(filename,paragraph)paragraph=[0,0,0]else:ifparagraph[0]==0:paragraph=[linenum,0,0]paragraph[1]+=line.count('.')paragraph[1]-=2*line.count('...')paragraph[2]=linenum-paragraph[0]+1prev_line=linedefreport_short_paragraph(filename,paragraph):ifparagraph[1]<=2andparagraph[2]<4:print'%s:%d paragraph of %d sentence(s) / %d lines'%(filename,paragraph[0]+1,paragraph[1],paragraph[2])if__name__=="__main__":forfilenameinsys.argv[1:]:file=open(filename)process(file)