[Old Thread] Capitalize first word in sentence with search and replace?

Hi Folks,

I've converted a .PDF file to .epub and was able to remove the headers and footers with only a little difficulty.

I notice, however, that after the conversion, a lot of the capitalization at the beginning of sentences has been lost (unrelated to headers and footers), which is rather annoying.

It occurred to me to use a regex to locate lower case chars at the start of sentences. Initially, I could think of two cases:

1) First character in the sentence after a paragraph break. Can locate with "\.<br>\s+[a-z]"

2) First character in the sentence in the middle of a paragraph, assuming the previous sentence ends with a period and is followed by one space. Can locate with "\. [a-z]".

My question is, what should I use in the replacement text box to cause Calibre to substitute the upper case char for that which was found by the original search regex?

At first I just tried "\.<br>\s+[A-Z]" and "\. [A-Z]", but the replacement just took those literal text strings and wrote them into the book, so that, for example, every sentence beginning with a lower case character in the middle of a paragraph now begins with "\. [A-Z]" rather than the correct letter.

I am not a script wizard so I can't directly write the perfect solution, but it seems you would need to use this unless you iterate the operation for the whole [A-Z] range.
One trick could also convert the text to an xls file, use the capitalize function then convert back to a text file, though you'd have to think about keeping the paragraphs and chapter in the process.

Thanks guys. Yes, I eventually did download Sigil and take a look and was able to do it there. Sigil is a bit more inconvenient to use; or at least it seems so to me, but then I haven't put any time into learning it yet.

It would seem then that Calibre's search and replace function cannot use back references to a group in the initial search regexp?

Install the "Open With" plugin, assign "Open with Sigil" to a keyboard shortcut and you will be way ahead of anything you can do in calibre when it comes to search and replace - at least for working with EPUB. I assign Alt+E in my case and it is second nature to do any editing be it css tweaks, find/replace operations, TOC manipulations etc in Sigil that way.

Personally my toes curl every time I see one of these sort of threads to look at the sort of hoops people are jumping through to try to use the calibre S&R . It certainly has a purpose if you are not working with EPUBs (such as stripping PDF header/footers as part of a conversion to EPUB) but I would never ever use it for anything outside of that and would always recommend someone convert to EPUB (if not already), do their editing using either Sigil or Tweak ePub/HTML editor and then convert to their target format if EPUB isn't the end game.

FWIW, to capitalize the first word in a sentence (using Sigil) is quite straightforward. For the example below, I would have defined CSS classes "indentoff" (whatever) for the first sentence following a scene break, and "caps" to transform a string to capital letters. So:

Find:
<p class="indentoff">(.*?)[space]

will find the first text string after "indentoff" followed by a space. Note: [space] here represents a typed space; it is not part of the Regex!