Update a file using another file

Update a file using another file

Author

Message

Romik#1 / 18

Update a file using another file

Please Ignore other posts on this subject Howzit

I have been trying to write a script that compares two similar files and updates the changes , I have been trying for two days and cannot get it right. FileA is the Table that needs to be updated and fileB is the table with the latest records. The third field in FILEA(cifa) needs to be compared with the first field in FILEB(cifb) and the fith field in FileA(Account number) compared with the 3rd field in FileB (Account Number)to see whether an existing customer open another account eg had a savings and then opened a cheque account later. you will notice that a portion of the account number matches the cif number.

notice that Harry has opened another account and his customer number stays the same (16), so the record with the new account needs to be taken into account. Guys I have tried using this script that I made but is does not work, why??

% new account needs to be taken into account. Guys I have tried using this % script that I made but is does not work, why??

% BEGIN { FS = OFS = "|" % while (getline <"A") {

Some indentation would help make what this is doing a bit clearer. This while loop is going over all the lines of A. It's executed once per script invocation.

% cifa = $3; acca = substr($5, 5, 5) % while (getline <"B") {

This while loop is going over all the lines of B. It's executed once per line of A.

% cifb = $1; accb = substr($3, 5, 5); name = $2; accfull = $3 % }

This is the end of the B loop. The next time you execute the B loop, getline will fail, because it's already at the end of B. You need to put this here: close("B")

% }

This is the end of the A loop.

% {if (cifb != cifa || accb != acca ) { % print cifb, name, accfull % } % } This block is being executed outside both loops. You're comparing only the last line of A to the last line of B. You need to do this inside the B loop.

% }

All that aside, if you have large files, and especially if file B is large, this will be quite inefficient. What I would do is read the file that's being updated into an array, then loop on the other file comparing values, eg BEGIN { ARGC=2 ARGV[1] = "B"

I would .like to use arrays but I have been trying to avoid them as I do not know how to use them, and I am not familar with the ARGC and ARGV functions as my books do not explain them well. How would I put the file be updated into an array and then us these functions!

Quote:

>[...]

>% new account needs to be taken into account. Guys I have tried using this >% script that I made but is does not work, why??

>% BEGIN { FS = OFS = "|" >% while (getline <"A") {

>Some indentation would help make what this is doing a bit clearer. >This while loop is going over all the lines of A. It's executed >once per script invocation.

>% cifa = $3; acca = substr($5, 5, 5) >% while (getline <"B") {

>This while loop is going over all the lines of B. It's executed >once per line of A.

>% cifb = $1; accb = substr($3, 5, 5); name = $2; accfull = $3 >% }

>This is the end of the B loop. The next time you execute the B >loop, getline will fail, because it's already at the end of >B. You need to put this here: > close("B")

>% }

>This is the end of the A loop.

>% {if (cifb != cifa || accb != acca ) { >% print cifb, name, accfull >% } >% } >This block is being executed outside both loops. You're comparing only >the last line of A to the last line of B. You need to do this inside >the B loop.

>% }

>All that aside, if you have large files, and especially if file B >is large, this will be quite inefficient. What I would do is read >the file that's being updated into an array, then loop on the other >file comparing values, eg > BEGIN { > ARGC=2 > ARGV[1] = "B"

>% new account needs to be taken into account. Guys I have tried using this >% script that I made but is does not work, why??

>% BEGIN { FS = OFS = "|" >% while (getline <"A") {

>Some indentation would help make what this is doing a bit clearer. >This while loop is going over all the lines of A. It's executed >once per script invocation.

>% cifa = $3; acca = substr($5, 5, 5) >% while (getline <"B") {

>This while loop is going over all the lines of B. It's executed >once per line of A.

>% cifb = $1; accb = substr($3, 5, 5); name = $2; accfull = $3 >% }

>This is the end of the B loop. The next time you execute the B >loop, getline will fail, because it's already at the end of >B. You need to put this here: > close("B")

>% }

>This is the end of the A loop.

>% {if (cifb != cifa || accb != acca ) { >% print cifb, name, accfull >% } >% } >This block is being executed outside both loops. You're comparing only >the last line of A to the last line of B. You need to do this inside >the B loop.

>% }

>All that aside, if you have large files, and especially if file B >is large, this will be quite inefficient. What I would do is read >the file that's being updated into an array, then loop on the other >file comparing values, eg > BEGIN { > ARGC=2 > ARGV[1] = "B"

>I would .like to use arrays but I have been trying to avoid them as I do not >know how to use them, and I am not familar with the ARGC and ARGV functions >as my books do not explain them well. How would I put the file be updated >into an array and then us these functions!

Example of reading a file into an array then using it to match against a second file.

Explanation: the first field from FileA is stored as the index into the array recA, and the entire record in FileA is the corresponding value. I'm assuming here that the first field is a key field - IMO updating individual records in unkeyed database tables is one of the more futile exercises in programming. For the remaining input files, if the first field matches one from FileA, replace the record with the record from FileA; otherwise, keep the current record.

Trying to use awk for nontrivial tasks, especially database applications, without using arrays is not an intelligent course of action. Arrays provide rather critical functionality for this sort of thing. Better you should learn arrays before writing many more of these database scripts. Experiment with small scripts.

In all cases involving getline, you should be aware of the possibility of an error return if the file can't be accessed. Although it's appealing to write

while (getline <"file") ... # Dangerous

that's an infinite loop if file doesn't exist, because with a nonexistent file getline returns -1, a nonzero value that represents true. The preferred way is

while (getline <"file" > 0) ... # Safe

Here the loop will be executed only when getline returns 1.

Quote:

> ... it gives me a wierd output??

For non-huge data sets, use associative arrays to solve this simple match/merge problem. Read the smaller file first, store the key values as subscripts to an associative array and the required data as their corresponding elements, then process the larger file and use awk's "var in arr" construct to look up matching records. (This method has been demonstrated in other articles within this thread.)

For data sets that are too big to be stored in memory, or if you just can't be bothered to learn to program with aggregate variables, order Monty's Magical Match/Merge awk script from Ronco today. It's on sale now for not $399.95, not $299.95, but ONLY $199.95!

-- Jim Monty

Tempe, Arizona USA

Thu, 31 Jan 2002 03:00:00 GMT

Jim Mont#8 / 18

Update a file using another file

Quote:

> > I would like to use arrays but I have been trying to avoid them as I do not > > know how to use them, and I am not familar with the ARGC and ARGV functions > > as my books do not explain them well. How would I put the file be updated > > into an array and then use these functions!

> Example of reading a file into an array then using it to match against a > second file.

to process the first file separately from the second and subsequent files is handy for one-liners (well, two-liners ;-), but it cannot be recommended for use in nontrivial awk programs. It is inefficient to test what file you're currently processing at each input record. This is what BEGIN is for.

> Trying to use awk for nontrivial tasks, especially database applications, > without using arrays is not an intelligent course of action. Arrays provide > rather critical functionality for this sort of thing. Better you should learn > arrays before writing many more of these database scripts. Experiment with > small scripts.

Learn to program? Learn awk? What the hell are you talking about, Man? This is comp.lang.awk.programs.for.free! Your advice is off-topic.

-- Jim Monty

Tempe, Arizona USA

Thu, 31 Jan 2002 03:00:00 GMT

Harlan Gro#9 / 18

Update a file using another file

writes:

<snip>

Quote:

>Using this trick

> FILENAME == "firstfile.txt" { ... ; next } > { ... }

>to process the first file separately from the second and subsequent >files is handy for one-liners (well, two-liners ;-), but it cannot >be recommended for use in nontrivial awk programs. It is inefficient >to test what file you're currently processing at each input record. >This is what BEGIN is for.

<snip>

Touche.

Quote:

>And here are the results of my benchmark tests:

> Jim's Idiom: > 2.7 microseconds

> Harlan's Trick: > 5 hrs., 12 mins., 66.3 secs.

Are you sure you haven't worked for Microsoft?

<snip>

Quote:

>Learn to program? Learn awk? What the hell are you talking about, >Man? This is comp.lang.awk.programs.for.free! Your advice is >off-topic.

Given what Romiko is doing, I'm at a loss to know why he's not using one of the SQL programs available for linux (at least I think I remember he mentioned using red hat). The database module in StarOffice can write tables to text files (and thus have I exhausted my knowledge of unix DBMS offerings). I like awk, but I don't think I'd like my bank migrating account records using awk scripts.

On the other hand, maybe he can tell us who he's working for. I wonder if they'd like a mail merge program written in APL? Or maybe a natural language query system written in Forth?

Thu, 31 Jan 2002 03:00:00 GMT

Romik#10 / 18

Update a file using another file

Howzit dudes, I got the program working however it was by mistake, he he! I do not understand why it works here is the program and the two files and the output: I use the command awk -f report.awk A B > output.

PLEASE can someone tell me why it works coz when I wrote this scripts I thought it would print records that are the same but it prints the Differences (which is what I want!!!!), also how can get rid of those empty lines in the output file? Thanks DUDES!

Harlan wrote I like awk, but I don't think I'd like my bank migrating account records using awk scripts. ---------- I do not understand why not use awk. I am a Microsoft Engineer and have used Access and SQL 6.5 and yes they can do the job however I feel that AWK cak automate most of the tasks while SQL and Access you need to put in alot of manual intervention. Secondly I am sick of Microsoft and therefore decided to try something different and so that is why I decided to use Linux with AWK. Why I did not use the other utils available on Linux RedHat is because I do not Know how to use then and there is just not enough time for me to learn StarOffice etc. I am working for Nedcor International (NedBank).

Quote:

>writes:

><snip> >>Using this trick

>> FILENAME == "firstfile.txt" { ... ; next } >> { ... }

>>to process the first file separately from the second and subsequent >>files is handy for one-liners (well, two-liners ;-), but it cannot >>be recommended for use in nontrivial awk programs. It is inefficient >>to test what file you're currently processing at each input record. >>This is what BEGIN is for. ><snip>

>Touche.

>>And here are the results of my benchmark tests:

>> Jim's Idiom: >> 2.7 microseconds

>> Harlan's Trick: >> 5 hrs., 12 mins., 66.3 secs.

>Are you sure you haven't worked for Microsoft?

><snip> >>Learn to program? Learn awk? What the hell are you talking about, >>Man? This is comp.lang.awk.programs.for.free! Your advice is >>off-topic.

>Given what Romiko is doing, I'm at a loss to know why he's not using one of the >SQL programs available for linux (at least I think I remember he mentioned >using red hat). The database module in StarOffice can write tables to text >files (and thus have I exhausted my knowledge of unix DBMS offerings). I like >awk, but I don't think I'd like my bank migrating account records using awk >scripts.

>On the other hand, maybe he can tell us who he's working for. I wonder if >they'd like a mail merge program written in APL? Or maybe a natural language >query system written in Forth?

Thu, 31 Jan 2002 03:00:00 GMT

Patrick TJ McPh#12 / 18

Update a file using another file

% do not understand why it works here is the program and the two files and the

It's not clear at this point what you're trying to achieve. Anyway, if cannot possibly be giving the output you claim, based on the input you claim and the invocation you claim.

Now, why does this work? Stop and take a few deep breaths. Meditate. Here's a mantra that I found helpful when I was a consultant: in late, out early, two-hour lunch.

Now that you've calmed down, try and think about what the hell it is you're trying to achieve, and then look at the very, very short program above and try to figure out why that program does whatever that thing is. It is only through this kind of mental exercise that you will attain enlightenment. Spend a week on it, if you have to. If you don't have a week, then don't worry about why it works. If you _have_ to know why it works, then spend a week on it, if you have to. --

>Now, why does this work? Stop and take a few deep breaths. Meditate. Here's >a mantra that I found helpful when I was a consultant: > in late, out early, two-hour lunch.

>Now that you've calmed down, try and think about what the hell it is you're >trying to achieve, and then look at the very, very short program above >and try to figure out why that program does whatever that thing is. It >is only through this kind of mental exercise that you will attain >enlightenment. Spend a week on it, if you have to. If you don't have a week, >then don't worry about why it works. If you _have_ to know why it works, >then spend a week on it, if you have to. >--

-------------- this line of code FILENAME == "A" {if ($3 in cusrega) print curega[$3] ; else print I made a printing mistake "curega[$3} should be cusrega[$3], so what actually happened was that the else statement gave me the diffs between the two files, wierd why i made this mistake but hey patricks advise on meditate is cool but that mantra, naaa Cheers dudes.