Find and Replace: Do It Once, Do It Twice

Out of all the small jobs that make up the big job of getting a book ready for publication, proofreading is the job nobody wants. It is NO FUN.

It’s exacting, it’s painstaking, it reduces an otherwise interesting piece of writing into boring little components that must be examined individually. If your attention wanders or if you get caught up in the story (it’s harder to proofread a rousing good story than a so-so one), you can miss errors. Ideally, any project should have at least two proofreaders. This isn’t an ideal world, however, and not everybody has the funds or the qualified (and indulgent) friends to get two reads.

When I build an ebook, I either proofread it myself or send a proof copy to the writer to proofread. Sometimes we both proofread it. All in the hopes of rooting out the boo-boos and gremlins before a paying customer does.

I have, of course, learned a few tricks (of course) along the way. One of the most valuable tools in my arsenal (second only to Webster’s 9th) is the Find/Replace function. This is especially true since I have found that most writers have a tendency to repeat mistakes. One does need to be careful, though, about global FIND/REPLACE. Or you might end up with something like this:

Barnes & Noble was briefly suspected of employing an outrageous anti-Amazon marketing strategy in May after blogger Philip Howard noticed that a version of Tolstoy’s “War and Peace” sold by the chain store had substituted “nook” for every instance of the word “kindle” throughout the text, resulting in sentences like, “It was as if a light had been Nookd in a carved and painted lantern….” The e-book turned out to have been published by a third-party company, Superior Formatting Publishing, who issued an apology (still posted on the company’s Web home page) explaining that it had accidentally applied the “find and replace” function to the entire text when reformatting the Kindle version of the book for the Nook platform.

The stuff of a proofreader’s nightmares.

Every text handling program has its own set of rules and functions. I can’t possibly cover them all here. I suggest you play with your program’s FIND/REPLACE function and figure out what it can and cannot do. The one thing that every program has in common is that it searches for a unique string of characters. That unique string can include spaces and punctuation.

There are some F/R searches I do as a matter of course. The first is for extra spaces. Extra spaces are the bane of ebooks. They all need to be rooted out. I run searches for double spaces between sentences within paragraphs, and for extra spaces at the beginnings and ends of paragraphs. I also run searches for extra paragraph returns.

The second routine search I do is for backward quote marks and apostrophes. MS Word, especially, has a bad habit of turning quote marks the wrong way, especially when the quote marks are connected to em or en dashes or at the beginning of truncated words. Here the basic rules of grammar are useful. For instance, the left double quote belongs at the beginning of a quoted passage. I will search for a space right double quote or a paragraph return or new line right double quote. I run the opposite search for wrong-way right double quotes by looking for left double quotes at the end of sentences.

Another routine search is for proper names and place names. When I proofread I make a list of preferred spellings. Flying fingers or attention lapses trip up writers. Sometimes the misspellings look right and are easily missed. Take my name for instance. “Jay” looks right, but I spell it “Jaye.” I’ll do a search for “Jay” and “Jay’s” to catch any instances where the “e” was dropped.

The same thing goes for preferred spellings. A word such as “judgment” is also correctly spelled as “judgement.” It doesn’t matter to me what the writer prefers–consistency is my fallback. If the writer prefers the former, I will do a search for the latter and change any instances I find.

I’ve worked on quite a few backlist books that have been scanned and run through OCR. Do enough of them and you start recognizing common OCR errors. For instance, misreading the letter “e” as a “c”. Spell check will catch the most egregious errors, but if the text is supposed to be “eat” and the OCR reads it as “cat” then spell check is useless. It doesn’t take much time to run a search for the word “cat” to make sure each usage is what the writer intended. Another common problem with scanned books is that typesetters often use hyphens and en dashes to space text on a line. Finding those is a bear, but F/R is a big help in rooting out the many permutations that end up as errors in an ebook.

I can’t possibly cover every F/R trick. If you, while you are proofreading your own work, get into the habit of assuming you have a tendency to repeat certain errors, you can use F/R to help you create a cleaner ebook. If you find a goof, run a quick search to see if you repeated it elsewhere.

14 thoughts on “Find and Replace: Do It Once, Do It Twice”

This is easy. In the FIND box tap the space bar twice (to make two spaces–you won’t SEE them, but the program does) and in the REPLACE box tap the space bar once. Do a REPLACE ALL and repeat until the message box tells you it can’t find anymore. If you are using Word and want to find extra spaces at the ends and beginnings of paragraphs, the search term for a paragraph return is ^p. Search for that with a space either before or after the ^p and you’ll find the extra spaces.

When proofing an ebook on an actual device, you WILL NOT be able to spot extra spaces. That’s because devices that justify text on the screen do so by manipulating the spaces between words and sentences. Often times it will appear that the spacing is wrong, but it isn’t. Extra spaces can only be found in the actual document.

You might want to add soft returns to your list. I’ve found that many people use SHIFT-RETURN without realizing it. When typesetting a print book, I use it as a last resort (after kerning, letterspacing, hyphenation, etc.) have failed in order to force a line to break where I want it to within a paragraph. This feature is dangerous in most hands, and probably quite destructive when it comes to HTML.

My best (worst?) search-and-destroy experience was when I managed to change every instance of a period, space, capital A with a period and a space. Ergo, I deleted the As in every sentence that began with A, and then I saved the job! Since I’d already done a boatload of other correction work on it (this was in Quark) prior to this bone-headed maneuver, I had a choice: Go back to old version and re-input all the editor/author proof changes (massive) or go through the 40-page section and fix the As by hand.

It was a sobering moment, especially since I had many years of typesetting experience under my belt by then.

For those who’re interested, you can find a soft return by putting ^l in your search box, but don’t change it arbitrarily to a hard return (^p) because it might actually be in the middle of a paragraph where you would want a space instead. This is one search best done one at a time.

Hi Maggie, No doubt we could fill a book with horror stories about too-swift clicky fingers. Heh. I’ve learned the hard way to only do Global Replace Alls when I am 100% certain I’m working with unique character strings.

As for soft returns. I’m finding the use of them to be rather arbitrarily handled by ereader devices. Some respect them, some don’t. Even though ebooks are based on html, it doesn’t mean that every bit of html coding works in them. I find it safer in general to not mess with them. I use other coding for poetry snippets and such.

You do remind me of something I should have included in the article: ALWAYS turn on the Show feature whenever you’re doing document clean-up, either pre- or post-production. It’s the only way to visually track what it is you’re doing and it could save you from yourself.

Hi Jaye: Is there an HTML code for a soft return? I imagine it might be quite useful in poetry, but I can’t think of another instance where it might be. If you know of any, please share. I enjoy learning from you and this invaluable blog.

A contained break code works for Kindle: (wordpress is driving me nuts. It really really really wants to treat whatever I type as a command)
Less than symbol br space backslash Greater than symbol.

I hope that’s not too confusing.

Kindlegen has no problem interpreting it. However, EpubCheck will kick it as an invalid command. If you use soft returns in Word and run it through MobiCreator or Calibre, it will work sometimes, and sometimes it won’t. I don’t know the hows and whys of it. All I know is, it causes enough problems sometimes that I avoid it. For poetry I use a paragraph class:

p.quote
{
text-indent: 0;
margin: 0 3.5em;
text-align: left;
padding: 0;
font-style: italic;
}
That sets off the poetry in a nice block, and the left alignment prevents it from being stretched out of shape by justification. It’s a bit of a pain doing this line by line, but at least I can control the output. If I were doing an entire book of poetry, I’d definitely come up with something else.

Thanks for the code. It looks very similar to the code I use for extracts or letters in my books. When I read HTML, I’m always reminded of the fact that programmers invented it, not typesetters, and some of the terminology grates … padding and margins, for instance. I have to leave my typesetting hat at the door when dealing with it.

Don’t blame HTML on programmers. It was invented by Tim Berners-Lee who was trained as a physicist. And he actually had worked on typesetting systems before he created the web and HTML.

The things that are bothering you are really CSS, which was invented by Håkon Lie. The brilliant part of CSS is the C for cascading. Unfortunately, all the major ereaders deliberately violate the rules of the cascade because the big publishing companies can’t be bothered to learn web technologies. You should remember that CSS was designed for the web, not for the printed page. Most of the differences between CSS concepts and typesetting concepts are really very necessary. Håkon has done some really cool work extending CSS to work with page-oriented documents, but the powers that be in the ebook world seemed determined to ignore all that and go chasing after apps.

Thanks for the heads up, William Ockham. I always enjoy reading your posts, here and on Passive Voice. You are the ‘voice’ of reason in so many things and I learn a lot from you, as I did just now. But I’m an old-hat typesetter. That said, I’ve learned that it’s best to learn new tricks. After much reading and asking questions, I now know enough about HTML/CSS to produce nice looking Mobi and ePub books (including one that has footnotes and links), but I still get my knickers in a twist when it comes to ‘margins’ of a print book that I’m used to and the margins that a programmer has determined belong in CSS.

As for padding?

Among other things, it’s a bunch of empty paragraphs that writers use to flesh out a story that doesn’t meet its target page count, and the cotton batting that quilters sandwich between the quilt top and its backing.

There are times when terminology trips us up and we don’t see it coming.