I would like to subset the data multiple times along column 8. Then for each subset ask, how many times columns 4 and 5 do not match, and the same for columns 6 and 7. The only way I could think of was to mesh bash and awk, but it does not seem to work.

Please use [code][/code] tags around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, colors, or other fancy formatting.

Could you please explain what you want a little more clearly? Show us exactly what the output should look like for the above example.

The first thing I noticed:

Code:

for(i=1, i=120, i++){
grep $8>=i | awk

The first part of this command is shell syntax, not awk syntax, so, for example, $8 here would be considered the shell's eighth positional parameter, and the for loop syntax is completely wrong.

rafir

04-10-2012 01:20 PM

That's true. I am mixing up formats. So for (i =1:18) I would want an output of:

3 0 0
4 1 1
...
12 2 1
13 2 2
14 3 3
15 3 4
16 3 6
17 3 6
18 4 6

grail

04-10-2012 01:41 PM

Yeah still lost me :( Maybe you could explain how you are measuring the data you have shown, for example, 18 4 6, I follow that 18 is in the last column but have absolutely zero
ideas on how you manufactured the other 2 numbers???

amani

04-10-2012 01:49 PM

@grail, 4 ,6 must be the number of nonmatches (read 1st post)

rafir

04-10-2012 01:58 PM

amani is right. When the last column is <= 18, there are 4 nonmatches where ($4 !=$5), and 6 nonmatches where column ($6 != $7)

grail

04-10-2012 02:05 PM

Okay ... so it is cumulative ... nice to know :)

Next silly question, when the last column is the same number, ie row 2 and 3 both end in a 4, are we not to output the information until there is a change in the last column?

As an example if we were not doing it per change, the output would be:

Code:

3 0 0
4 0 0
4 1 1

It may seem like an odd question being your output example, however, your example also includes data not present in the original example, such as 17.

rafir

04-10-2012 02:38 PM

Yes only output when i changes value, and every iteration of i should get only one entry. So the complete output from above would be:

The loop in the END section terminates at the last number encountered in the last column at the end of the file. Change $NF with 120 if you already know it is the last/maximum value (or if the numbers in the last column are not sorted in ascending order). Moreover, please notice that - as written - this is a script interpreted by awk (see the sha-bang in the very first line). Hope this helps.

grail

04-11-2012 04:50 AM

Right ... so now that I have all the information, you might want something like:

Currently it does not print zeros but I am sure you can change as need be :)

rafir

04-11-2012 01:32 PM

When run on the data above, both codes produce huge (but not identical) files that seems to be infinite loops.

What is the meaning of the syntax:

pair_one[$NF] = pair_one[$NF] + ( $4 != $5 )

colucix

04-11-2012 02:34 PM

Quote:

Originally Posted by rafir
(Post 4650356)

When run on the data above, both codes produce huge (but not identical) files that seems to be infinite loops.

How did you run the code? Please, show us what you entered in the command line, what did you get and what is the content of the current version of your script. Possibly using CODE tags to make it more readable.

Quote:

Originally Posted by rafir
(Post 4650356)

What is the meaning of the syntax:

pair_one[$NF] = pair_one[$NF] + ( $4 != $5 )

This means that the $NF-th element (that is the element that has index equal to the value of the last field of the current record) of the array pair_one is equal to itself increased by the value returned by the expression

Code:

( $4 != $5 )

In awk (and similarly in many programming languages) a logical expression is evaluated 0 if it's false and 1 if it's true. Hence the count is increased by 1 if the two fields are different and it is not increased if the two fields are equal. Hope it's a bit more clear, now.

grail

04-12-2012 04:18 AM

I am with colucix. I have run the code on the given example and it generates the exact output you have given.