i am trying to write a script to scan text files, extract data from them scan that data for sets of a's followed by t's just a's or just t's that are between 4 and 7 characters long then output the start and end points of each of those runs suffice to say it does not work, and i don't know why here is my code so far: #!/usr/bin/perl -w use strict; use warnings;

Don't just say "it does not work". Explain the expected output, and the output you get instead. Also use code tags around your code. Code of more than a few lines is hard to read if formatted like normal text.

to answer questions for now yes the input file is test.gbk the file is a mock genome file that I'm using to make sure the script works before i bring in the real ones. the reason that the first regex is not one piece is because there should be spaces in between each ten chars. i continued to fix errors until no more came but it still doesn't output correctly.

i have come to a new problem now though something is clearly wrong with my sum loops the terminal output is: Use of uninitialized value in addition (+) at atract.pl line 98, <$fh> line 136. Use of uninitialized value in addition (+) at atract.pl line 106, <$fh> line 136.

You problem is not in your arithmetic. You are not even creating the genome string. It would be easier to delete everything except the symbols a,c,g, and t.

Code

do{

open( my $fh, "<", "input.gbk" ) or die "cannot open < test.gbk: $!";

$INPUT_RECORD_SEPARATOR = undef;

$genome = <$fh>;

close $fh;

};

$genome =~ s/[^agct]//g;

Your first if is only executed once, and always fails.

The regular expression in your while statement always fails to match. The loop is never executed. If it did match, the loop would never terminate.

It is still not clear to me what you are trying to do. Consider the contrived genome “aaaaaaattttttt” . I count 38 subsequences that appear to meet your conditions. (Eight with 7 bases each, nine with 6 bases each, ten with 5 bases each, and eleven with 4 bases each.) Do you intend to find the start and end of each of them? If not, which ones? Are you sure that this explosion of sequences can never occur in nature? Explain.