Forum rules
For sharing working examples of macros / scripts. These can be in any script language supported by OpenOffice.org [Basic, Python, Netbean] or as source code files in Java or C# even - but requires the actual source code listing. This forum is not for asking questions about writing your own macros.

EDITED 7-4-12 simply to note that the macro appears, from a small test, to continue to work with ApacheOpenOffice's new implementation of regular expressions.

This is the same code stored at OOoForum.org. I worked too hard on it to risk it's potential loss. I think the maintainer of that forum will always act responsibly but he may die.

This macro removes the excess paragraph breaks from an ASCII editor type file and also works on text coped & pasted from the Web that has line breaks inserted like a message in these forums.

It also provides provides other options to indent each paragraph, reduce the spacing between paragraphs, change spaced indent to tabs, remove excess interior spacing, strip all indents and justification of he results. One routine in this macro is designed to reformat text that has been scanned and then OCRed directly into OOo. If your OCR program creates a left margin by inserting spaces before each line then you might find this handy.

Trying to reformat an ASCII file is an exercise in guesswork at best and you shouldn't expect perfect results. This macro has to make assumptions about what is truly the end of a paragraph, a title or part of a list.

'Version 2.2 5-6-06 John Vigor'Converts ASCII text files, or selected text within them, by stripping out excess'paragraph breaks. Works with items copied & pasted from the Web that contain line'breaks such as a message in these forums. 'WARNING - Anything stored on the Clipboard will be overwritten. A copy of your'original file will not be saved to the Clipboard if a file greater then 60K 'characters or selected text of any size and you are responsible for otherwise'backing it up. On the other hand, your original file will not be changed unless you'save the macro results and overwrite the original but this assumes a saved file.'Sample processing times on a 770 MHz machine in Pages/Seconds format: ' 10/4, 20/7, 40/16, 80/34, 160/81 (1.35 Min.).'Hint for long documents - TURN OFF AUTO-SPELLCHECKING. This will save time.'You can control what items the program asks you about and what happens if you choose'not to be asked about an item by editing the variables below. These variables are'ignored if you run the macro on selected text so if you customize them and need to 'be asked about items for a particular file then simply select the entire file with'Ctrl+A before running the macro. 'You should become familiar with how the program works before you attempt to customize'it. It works differently if the file is less or more than 60K and when processing a'selection.'Sub ASCII_Formatter_StartHere'VARIABLES YOU CAN CHANGE.AskShortParagraphs = True 'Show the query about keeping short paragraphs. 'Default answer if AskShortParagraphs is False. KeepShortParagraphs = True 'HIGHLY recommended and the faster of the two methods. I 'only maintain the other method because it's needed for one option.'-- AskShortParaLength = True 'Show a chance to adjust the program's estimate of what should'be considered a short paragraph that will kept.ShortDef = 20 'The minimum number of characters short of the right margin that a line'must end to be a short paragraph. The paragraph break at the end of any short paragraph'will be maintained. '--AskViewOptions = True 'Show the query about end of program Options. 'Default, if AskViewOptions is False. GoToOptions = True 'If True the Option variables below will take control. They will 'also control if AskViewOptions is True and you choose to go to the Options. '-- 'The following values control what Options are displayed following initial file processing:Show1stOptionSet = True 'Show the first set of Options, which are Indent All Paragraphs'and Reduce Paragraph Spacing. 'Defaults, if Show1stOptionSet is False. O1_1 = False 'Indent all paragraphs. O1_2 = False 'Reduce paragraph spacing. AskMaxParaSpacing = True 'Ask for the maximum number of blank "lines" between paragraphs. O1_2_1 = 1 'Default, if AskMaxParaSpacing is False.Show2ndOptionSet = True 'Options are Remove Excess Interior Spaces, Change Spaced'Indents to Tabs and Justify All Paragraphs. 'Defaults, if Show2ndOptionset is False. O2_1 = False 'Remove excess interior spaces. O2_2 = False 'Change spaced indents to tabs. O2_3 = False 'Justify text.'-- 'The following control aspects of full file, selection and/or Option processing and 'are not ignored if you process a selection:ShowBackUpWarning = True 'Show warning that file not copied to Clipboard if over 60K. ShowFinished = True 'Show the finished message.PageBite = 10 'Number of pages processed at a time for selected text or a file over 60K 'characters. Seems pretty good but I haven't seriously tried to optimize it.ViewBegin = True 'The cursor and your view are taken to the beginning of the document'when macro ends. "False" will leave you at the end.StripHyphens = True 'Assumes hyphens located at the right margin are editorial and do'not seperate true hyphenated words like "half-dollar". CheckForcedLeftMargin = True 'A VERY quick routine if spaces imitating a left margin'don't exist. Or it will delete such a margin if they do (known to work for my version of'ScanSoft's OCR software - the 1st line of the file will contain only a series of spaces).StripIndents = "0" 'This controls what happens during initial file processing. The default'value is recommended. The default ("0") is to leave all tabbed or spaced paragraph indents,'"1" will strip spaced indents, "2" will strip tabbed ones and "3" will strip both kinds.'If you run the macro on selected text and don't choose to go to the Options then you'will also be asked about this. One use for this is to deal with paragraphs offset, as'opposed to just 1st line indented, from the left margin. Normal processing will leave'all "lines" of such paragraphs indented and followed by paragraph breaks. Stripping'out the indents will convert these to regular paragraphs which can be edited normally'in Writer. Running the macro on selected text after normal processing is a personal'favorite of mine because I often scan documents with offsets which I need to edit.MarkIt = True 'Insert the "MarkWith" character as the last character of the file to'indicate it has been previouly processed. Allows you to run the macro again and go'directly to the other Options without wading through the full initial file processing'again. I do not recommend changing as it controls how the program works on a 2nd run. MarkWith = Chr(160) 'A nonbreaking space in most fonts, which isn't seen on printing.Override60K = False 'Setting this to True may significantly slow runtime on large files'or selections but no seperate processing document will be used if you don't like this.'.....................................................................................'DO NOT EDIT BELOW HERE UNLESS YOU KNOW WHAT YOU ARE DOING.lTime = Timer : RunTime = 0 : Skip = false : Over60K = false : ProcSel = false PrevProc = false : IsSelect = false : ASPL = AskShortParaLength : LastSection = FalsethisDoc = thisComponent : oDoc = thisDoc 'oDoc may change.thisVC = thisDoc.CurrentController.getViewCursor : oVC = thisVC 'oVC may change.thisText = thisDoc.Text MarkSel = thisDoc.Text.createTextCursorByRange(thisVC)'Mark any selection.thisFrame = thisDoc.CurrentController.getFrame()dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")FandR = thisDoc.createReplaceDescriptor() 'Find & Replace initial set up.FandR.searchRegularExpression = true 'Use regular expressions.If NOT thisVC.isCollapsed then IsSelect = trueIf IsSelect then a$ = "Process selected text only?"& Chr(13) & Chr(13) &"Cancel to Quit." RunTime = RunTime + (Timer - lTime) iAns = MsgBox(a$,3,"Text in this file has been selected.") lTime = Timer : If iAns = 2 then End If iAns = 6 then ProcSel = trueEndIf If ProcSel then 'Tests for over 60K. If Len(MarkSel.String) > 60000 OR Len(MarkSel.String) = 0 then Over60K = true ELSEIf thisDoc.characterCount > 60000 then Over60K = true EndIFIf Override60K then Over60K = falseIf NOT Over60K then LastSection = trueIf Over60K AND NOT ProcSel then MarkSel.gotoStart(false) : MarkSel.gotoEnd(true)thisTC = thisDoc.Text.createTextCursor : thisTC.gotoEnd(false)thisTC.goLeft(1,true) 'Get the last character. If a binding space 'then file was previously processed. Process or go to Options?IF thisTC.String = MarkWith then 'Was file previouly processed? PrevProc = True 'Previously processed. Skip = SkipToOptions(Show1stOptionSet,Show2ndOptionSet)EndIf'++++++++++++++++++++++++++ All basic information gathered, now control program flow.Select Case Skip Case True 'Go directly to the options. Select Case ProcSel 'Do we need to deal with selected text? Case False : BackUp(thisFrame,dispatcher) RunOptions(Show1stOptionSet,Show2ndOptionSet,FandR) Case True : NoBackUpWarning(ShowBackUpWarning) SetUpSelection(thisFrame,thatFrame,"Copy",dispatcher,MarkSel,IsSelect,Skip) RunOptions(Show1stOptionSet,Show2ndOptionSet,FandR) FinishSelection(thatFrame,MarkSel,FandR,dispatcher,MarkSel,thisVC) End Select Case False 'Going to do the regular file processing. Select Case ProcSel 'Do we need to deal selected text? Case True : NoBackUpWarning(ShowBackUpWarning) AskShortParas(AskShortParagraphs,ShortDef,thisTC,ASPL) AskIndents() : If CheckForcedLeftMargin then ForcedLeftMargin(thisTC,FandR) SetUpSelection(thisFrame,thatFrame,"Copy",dispatcher,MarkSel,IsSelect,Skip) If Over60K then ProcessOver60K(FandR,thisVC,thisDoc,dispatcher,MarkSel,PageBite) Else RunMainRoutines(FandR) FinishSelection(thatFrame,MarkSel,FandR,dispatcher,MarkSel,thisVC) EndIf oDoc = thisDoc AskRunOptions(AskViewOptions,Show1stOptionSet,Show2ndOptionSet,FandR) Case False 'Normally this would be the 1st time entire file is processed. 'Won't ask about indents because most users won't care. IF Over60K then NoBackUpWarning(ShowBackUpWarning) If NOT Over60K then BackUp(thisFrame,dispatcher) AskShortParas(AskShortParagraphs,ShortDef,thisTC,ASPL) If CheckForcedLeftMargin then ForcedLeftMargin(thisTC,FandR) If Over60K then SetUpSelection(thisFrame,thatFrame,"Copy",dispatcher,MarkSel,IsSelect,Skip) ProcessOver60K(FandR,thisVC,thisDoc,dispatcher,MarkSel,PageBite) oDoc = thisDoc Else RunMainRoutines(FandR) EndIf AskRunOptions(AskViewOptions,Show1stOptionSet,Show2ndOptionSet,FandR) End Select End Select'++++++++++++++++++++++++++++++thisVC.gotoEnd(False) If NOT PrevProc then thisText.insertString(thisVC,MarkWith,False)EndIfEndMessage(ShowFinished,MarkIt)If ViewBegin then thisVC.gotoStart(false)End Sub

Sub ForcedLeftMargin(thisTC,FandR)'Delete a forced left margin appearingthisTC.gotoStart(false) : thisTC.gotoEndOfParagraph(true)'in some OCRed text files.Margin$ = String(Len(thisTC.String)," ")If Len(Margin$) > 0 then FandR.setSearchString("^" & Margin$) Find = oDoc.findFirst(FandR) Do While Not IsNull(Find)'Can't use "replaceAll" because Find.String = ""'multiple a$s in one line will be deleted. If Find.gotoNextParagraph(false) then Find = oDoc.findNext(Find.End,FandR) Else Exit Do EndIf LoopEndIfEnd Sub

Sub CleanUp(FandR)FandR.setSearchString("\n") 'Just in case line breaks got into the doc changeFandR.setReplaceString("\n") 'them to paragraph breaks. This will also help withoDoc.replaceAll(FandR) 'some text copied & pasted from the web.FandR.setSearchString(" *$") 'Delete all spaces before paragraph breaks. A must!FandR.setReplaceString("")oDoc.replaceAll(FandR)End Sub

Sub ReplaceSpacesB4LinesWithTab(FandR as Object)MaxIndent = 10 'Any indent in excess of this will be ignored.FandR.setSearchString("^ *") 'find any number of spaces at beginning of lineFind = oDoc.findFirst(FandR) 'replace with tab, to replace with nothing use ""While NOT isNull(Find) If Len(Find.String) <= MaxIndent then Find.String = Chr(9) Find = oDoc.findNext(Find.End,FandR)Wend End Sub

Sub DeleteExcessInteriorSpaces(FandR as Object)SM = Chr(165) 'Space marker.FandR.setSearchString("^ *") 'find any number of spaces at beginning of paragraphFind = oDoc.findFirst(FandR)While NOT isNull(Find)Find.String = String(Len(Find.String),SM) 'replace with placeholdersFind = oDoc.findNext(Find.End,FandR)WendFandR.setSearchString(" *") 'find any number of spacesFandR.setReplaceString(" ") 'replace with one spaceoDoc.ReplaceAll(FandR) 'do itFandR.setSearchString("^" & SM & "*") 'find any number of placeholders at beginning of lineFind = oDoc.findFirst(FandR) 'turn them back into spacesWhile NOT isNull(Find)Find.String = String(Len(Find.String)," ")Find = oDoc.findNext(Find.End,FandR)WendEnd Sub

Sub EndMessage(ShowFinished,MarkIt)If NOT ShowFinished then Exit Suba$ = "Your original document was saved to the Clipboard and can be retrieved from "b$ = "there if you do not like the macro results. " RunTime = RunTime + (Timer - lTime)c$ = "Total processing time was "& RunTime &" Second(s) or "& RunTime/60 &" Minute(s)."d$ = "If you want to use the normal end of program options run this macro "d$ = d$ & "again on the file or a selection. " If MarkIt then e$ = "A previously processed marker has been inserted as the last file character. "EndIfIf NOT Over60K then MsgBox (a$ & b$ & e$ & Chr(13) & c$,0,"FINISHED!") Else MsgBox (d$ & e$ & Chr(13) & Chr(13) & c$,0,"FINISHED!")EndIf End Sub'Version 2.2 5-6-06

Last edited by JohnV on Wed Jul 04, 2012 6:19 pm, edited 1 time in total.

I briefly posted a reply asking for help with a problem that turned out to have "fixed itself" -- meaning that I do not know what the initial problem was; it was probably some weird formatting problem in pasting code copied from my old machine to my new machine. I deleted the post because the problem doesn't exist, but I would like to say thank you for making this macro available . I've used it now and again for the past several years (to help with text formatting after creating a script application to work with the tesseract OCR binary in 2007 [which I still use on occasion]).I do have another question, though. Have you ever 'ported' (or thought about 'porting') the functionality of this macro to some other scripting language?

I can confirm that this workhorse macro, which worked beautifully in OO 3.2.1, also works well with LibreOffice 3.5.4.2.

One hint: The way I've always used this snippet is to copy and paste the subject ASCII text into an OO/LO document. Then, before running the subject macro, I save and close the host OO/LO document, then reopen same. If I don't go through this save/close/reopen process, the macro does not work properly.

Last edited by Digger on Sun Oct 15, 2017 9:10 pm, edited 1 time in total.

As LibO V3.5.4 was mentioned, and the most recent released version of LibO is V 5.4.2, I did a very superficial short test which seemed to show that the huge "snippet" also works there roughly as expected.

One hint: If applied to a 'Writer' document containing frames in addition to the imported text, the Sub obviously tries to work also with the text content of the frames, but does not concatenate the paragraphs therein as expected.