This is a straight text file. I am writing the script using Powershell since I am a bit more familiar working with text files in PS. But I can do this in vbscript or python if it would be easier.

So this file contains a number of lines in it and there are many tabbed areas. I was able to remove the tab spacing from the file to clean it up a but. I was also successful at pulling out the lines that matched a specific string. But then the wrench happend, there are a large number of lines that have carriage returns within the string contents. here is my goals...

String to match is "AUTHORITY-CHECK OBJECT"

1) Remove Empty Tab Spacing - done

Code:

(gc FILENAME) -replace ' {2,}','' | sc NEWFILENAME

I am using variables where you see filenames since the paths are long and I will eventually replace with a user input box of sorts.

2) Remove carriage returns after "AUTHORITY-CHECK " if the next line begins with OBJECT or there is nothing after AUTHORITY-CHECK

So yes the carriage returns are mucking up my results. the information will be used to show our devs their fixes to the app are not working. So I want to make sure all lines are included. First results before I noticed the carriage returns showed 192 lines. Going back there about 326 total lines containing "AUTHORITY-CHECK"

I'm just learning to program so pardon my ignorance if I'm way off here haha, but is there an equivalent to Python's rstrip() in Powershell? With rstrip() in Python, you can strip off all whitespaces of a string or specify certain characters to remove - which would be carriage returns ("\r" in Python) in this case.

I think Trim() can do it but I haven't found good steps in utilizing Trim. TrimStart() can be used for space before the string value and there is an equivalent Trim for the end. The initial code I use to clean the first file works well enough. The part I got hung up on is the code to find carriage returns that appear after my string value. I want to find and replace them with a space or find and include the next line as part of the entire string.

Basically there are a series of fields that follow the string I am searching for. In the case of the carriage return, that set of data is dropped to the next line.

So I want to kill that CRLF at the under of the first line and join that line with the 2nd line. or make it so OBJECT is seen as part of the first line.

I could toss this in Python, but I know PS much better. Either way I will have to install Python or PS on my coworker's system so he can run these scripts. I was pondering vbscript for this since it will natively be supported.

I'm not into working with files yet in my programming learning yet, but can you identify line numbers of the lines you find "AUTHORITY-CHECK<CRLF>" on? For example, as your script is searching through the text file, it finds "AUTHORITY-CHECK<CRLF>" on line 5. If so, maybe you can check that line number + 1 to see if the next line (5 + 1) begins with "OBJECT..." (if "line number matching "AUTHORITY-CHECK" + 1 startswith "OBJECT"). And if it does, then strip off the carriage return on the "AUTHORITY-CHECK" line and print both of the lines together as one. (print(line 5, 5+1)).

I hope that makes sense I'm studying my Python material right now and the stuff I'm learning has me thinking about this thread again haha

I think it would be easiest to store the string a temp variable until the next line is read. If the next line is AUTHORITY-CHECK, write out (or do whatever with what's in the temp variable because the line is complete). Alternatively, if the next line is OBJECT, concatenate that with the temp variable and then write that out (or, again, do whatever further processing is required). Similar to what lorddicranius said.

Edit: I'm not that familiar with PowerShell. Can you read the entire file into a single variable instead of line by line? If so, you could do that first and then replace "`r`nObject" with "Object".

Last edited by dynamik on Wed Jan 25, 2012 6:46 pm, edited 1 time in total.

Thanks guys the logic seems right, now it just needs to be put to code. Dynamic, to my understanding, once a Get-Content cmdlet is used on a file, you can pipe the commands together so essentially you would:

gc filename | foreach line in file matching AUTHORITY-CHECK[CRLF] remove CRLF and combine with next line.

Not how it would code out but that is the logic, and then you can add:

| set-content NEWFILE

Alternatively I can probably toss in some IF THEN statements to capture the broken lines.

IF line = AUTHORITY-CHECK`rTHEN -replace "'r", " "AND combine with next line

or something like that. Might work on it later today since that always makes the day go quicker. Got frustrated when I hit the carriage return problem and realized I was spending too much time on this issue and I had other higher priority items to work on (though not quite as exciting).