I'm trying to do a search but a narrow one. Basically I've converted some PDFs to ePub but some paragraphs are broken up say one ends with half a sentence and the other paragraph continues on with the sentence.

I want to do search for any 2 characters and </p> but don't find ."</p>, .</p>, ?</p>, ."</p>, ?"</p>, !</p>, !"</p> as those should be proper sentence enders.

Right now I have [^.,^\?,^\!][a-z,A-Z,”,\,, ,+]</p> and it seems to work but is there a simpler way of doing this?

When I'm joining split sentences like that, I search for a lowercase letter after the <p> tags...

search: </p>\s*<p>([a-z])
replace: _\1
(Note the space before \1.)

Hope that helps.

It helps but I think I have to do like huebi suggested and do several sweeps. looking for different things each time because I found not all sentence splits end with a lowercase letter. Some ended with quotes but no period, some with a comma, some with a question/exclamation mark but no quotes because the person speaking was still speaking but it was continued in another paragraph.

Basically I was trying to catch that all in one go. Looking at other examples I think I have too many commas to separate stuff that don't need it.

[^.^\?^\!][a-zA-Z”\,\?\!+]</p>

Should find: ?</p> but not ?”</p> or z”</p> but not z.”</p>

Just tested this with the following BOLD foundITALICS skipped:<p>x?</p><p>x.”</p><p>x?</p><p>X!</p><p>x!”</p><p>x,</p><p>x,”</p><p>x”</p>

EDIT: The above can be simplified further: [^.?!][a-zA-Z”,?!]</p>
So basically we now have: if any of those 3 characters [^.?!] appear before any of these characters [a-zA-Z”,?!] (specifically the”) & </p> then skip that find.

Working on a book this morning and I found after stripping out all the class, style and useless spans/divs I was left once again with broken up sentences like this:
<p>this is</p>
<p>part of a</p>
<p>paragraph.</p>

<p>&nbsp;</p>

So I came up with:
FIND: ([a-z,’”.?!-])</p>\n\n\s\s<p>([a-z,A-Z“-])
REPLACE: \1 \2

\n = new line
\s = white space

All the <p>&nbsp;</p> are ignored and then I just strip them out when all the paragraphs are back together.