note
ww
<p>There are several twisty corridors here in the Monastery in which demoronizer cobwebs hang from the ceiling; IMO they're well worth pursuing by anyone interested in cleaning up the .html produced by <big><b>ANY </b></big> of MS's Word, Excel or supposedly WYSIWYG products. Look under the covers, and what you got was remarkable bloat and non-conformant code.</p>
<p>So, a few keywords for future Super_Searchers: "HTML, html MS, Microsoft, Office, Word, Excel, FrontPage, PowerPoint, Publisher, cleanup, parse" ...and there surely could be more (arguably even Notepad, which when in word-wrap mode adds MS-ish lineends at every displayed wrap position).</p>
<p>[davidrw] and [astroboy] offered links to useful alternate tools in [id://457280]. There also a bit of discussion re the issues implied in [samtregar]'s remark in this thread.</p>
<p>Self-updating of demoronizer is laid out very nicely by [derby] in [Re^3: Reg Ex to strip MS smart quotes]</p>
<p><b>But </b> (<i>... sigh!</i> )...even the the lastest Word->html output does not exactly demonstrate that the allegedly-enlightened giant in Redmond has learned to avoid making the same mistakes in different (ie, incompatible) ways.</p>
<p> ...and, oh yes, a (deprecated) disclaimer: I don't hate W32; I just hate cleaning up MS .html to w3c standards.</p>
Fair warning, also: I should probably use a sig like <b><small>html 4.01 dinosaur</small></b>
524176
524238