Each individual sequence is preceded by a sequence identifier line. This identifier line is always indicated by a “>” at the beginning of this line.

Here’s a quick explanation of how it works, as I currently understand it:

!/^>/ {next}

– If a line (i.e. record) begins with a “>”, go to the next line (record).

{getline seq}

– “getline” reads the next record and assigns the entire record to a variable called “seq”

length(seq) >= 200

– If the length of the “seq” record is greater than, or equal to, 200 then…

{print $0 "\n" seq}

– Print all records ($0) of the variable “seq” in the file that matched our conditions, each on a new line (“\n”)

Important note: this will only work on sequences that exist on a single line in the file. If the sequence wraps to multiple lines, the code above will not work. You can fix your FASTA files so that the sequences for each entry exist on single lines:

Code is correct (in a round about way esp filtering by >), but explanation is incorrect.
1. !/^>/ {next}-
a) !/^>/ — look for the line that doesn’t start with > (emphasis on doesn’t)
b) {next} — After above step go to next lines that doesn’t start with > i.e go to lines that start with > (which is header line)
2. {getline seq} — Go to the next line of next line that doesn’t with > (double negation here). Instead code should have been direct. i.e come back to step 1a.

Pick up the lines that start with > and then store the next lines in variable. The code should have been: