3 Answers
3

To get the header as well, just add this to the beginning of the script:

BEGIN{print "CHR SNP MAF P"}

Explanation:

First of all, when two files are passed to awk, they are processed one after another. There are two variables important here: NR is the line number from the beginning of the awk command, and FNR is the line number from the beginning of the current file. That is, when the first file is processed (here file2), NR and FNR have the same value, which is the value of the line currently processed. But when awk pass to second file, FNR is reset to 1, so that NR and FNR are no longer the same. So that the test FNR==NR is a trick for knowing if the file processed is the first or not.

So let's see the code. The condition FNR==NR && FNR>1 tests whether we are processing the first file and not the first line. If it's the case, we store the value of fifth column (MAF) in an array indexed by the second one (SNP) and then the next statement says to pass to the following line.

When awk processes the second file (which is file1), the first test is false, so that awk tries the second test: FNR > 1 && ($2 in a) && $3 == "ALL", that is: not the first line of the file + second column value (SNP) exists in table a + third column value (TEST) is "ALL". If it is the case, then it prints column 1 (CHR) and two (SNP), gets the MAF value from the array with a[$2], and then prints column nine (P).

Adding a BEGIN{...} statement at the beginning adds a command which is run only before the first line is processed.