The way I have it now I'm using split, foreach's, whiles nested in whiles, ifs, character arrays, etc. and it's over 90 lines of actual code. It just runs slow. The files I work on are 200,000+ lines, text, fixed-width fields, and sent by clients who probably make them in Excel in parts. Northeast and Puerto Rico zips start with 0 or 00.

I KNOW it's possible to cut it at least in half using regex's, but I can't figure out an efficient way. Can you help me? Thanks!

Yea well it's not "fixed" permanently, each file will have different positions (any integer) and length (5 or 9 only), but it will be the same ones throughout the file. That's why they're command line args.

Although I think you could use unpack() to good use since the length of the lines will either be 15 or 19 character

It's just the substring zip that will either be 9 or 5 characters.

Actually the lines can have any arbitrary size (1,000+ bytes!) and any number of fields at whatever size the clients feel like sending me. They're not techy, so they'll have data in a spreadsheet and send it to me, sometimes in a spreadsheet, sometimes in text, sometimes delimited, sometimes fixed width, sometimes (grrr) with no eols so I have a text file with one line that 190,000,000,000 or so bytes (it's probably an unconverted Mac file).

I do need this to be sort of readable, but thanks for your super efficient method.

Although I think you could use unpack() to good use since the length of the lines will either be 15 or 19 character

It's just the substring zip that will either be 9 or 5 characters.

Actually the lines can have any arbitrary size (1,000+ bytes!) and any number of fields at whatever size the clients feel like sending me. They're not techy, so they'll have data in a spreadsheet and send it to me, sometimes in a spreadsheet, sometimes in text, sometimes delimited, sometimes fixed width, sometimes (grrr) with no eols so I have a text file with one line that 190,000,000,000 or so bytes (it's probably an unconverted Mac file).

I do need this to be sort of readable, but thanks for your super efficient method.

Hard to know what to suggest if the lines can be any arbitrary size. There could be many many false matches in a file that could be one long line. There has to be a rule or a set of rules that can be applied to the problem, if not, it will take a lot of filtering and double-checking to add those missing zeros.

I assumed the lines were what you posted. -------------------------------------------------

Hard to know what to suggest if the lines can be any arbitrary size. There could be many many false matches in a file that could be one long line. There has to be a rule or a set of rules that can be applied to the problem, if not, it will take a lot of filtering and double-checking to add those missing zeros.

KevinR, you're making it more difficult than it really is . There doesn't need to be any rules, I vim the file and see myself what position the zip code is in.

Let's just start at the point where I have a 5 (or 9) character (digits) field in each record, and some will have trailing spaces. For x trailing spaces, shift 5 (or 9) - x characters in the field to the right x times, and insert x zeroes in the left.

The steps as I see them (9 digit field length): 1254854__ (2 spaces at the end) 11254854_ (shift 9-2=7 chars to the right once) 111254854 (twice) 011254854 (insert one zero) 001254854 (and a second zero)

Like I said, using regex there has to be a faster way to do it, and all of your replies are leading me there.

I'll know exactly where the zip code field will be, no worries about that. I only work on one file at a time, and that's also the main reason I have command line args (that I will literally type myself):

and I have, in $zipCode, a 5- or 9-byte length string that may or may not need to be fixed. This part is solved already. And yes I will also do

Code

substr($line,$zipStart,$zipLength) = $zipCodeFixed

to reinsert the corrected zip back in its place.

The fixing is all I really need help on. I tried your code and it almost works, just for some reason it changes the field length in the output file to 7 (I didn't see any numbers in your code that would do that) and it doesn't add zeroes or remove trailing spaces. I think I can figure the rest out on my own though, thanks for your help. I'll post back during the work week if I need more help. Thanks again!