I have written a script which allows the user to pick out sequences from a file which are within a specific length range. I am now trying to apply subroutines to this script, but I am getting a bit confused with declaration and invocation as well as how to pass values from one subroutine into the invocation of another subroutine.

My initial script is as follows:

Code

open F, "human_hg19_circRNAs_putative_spliced_sequence.fa", or die $!;

for $id (@seqarray){ if (-f $id){print $id, " already exists. It is about to be overwritten"}; open new_F, '>>', "SeqLength_$minlength-$maxlength", or die$!; print new_F ($id."\n".$seq{$id}."\n"); close new_F; }

close F;

Below is my amateur attempt at creating subroutines:

Code

##Invoking subroutines

my ($this_id, %that_seq) = HashSequences(); SpecifySeqLengths();

## Open file. Hash sequences. Make the sequence IDs the keys to their ## respective (hashed) sequences.

sub HashSequences{

open F, "human_hg19_circRNAs_putative_spliced_sequence.fa", or die $!;

for $id (@seqarray){ if (-f $id){print $id, " already exists. It is about to be overwritten"}; open new_F, '>>', "SeqLength_$minlength-$maxlength", or die$!; print new_F ($id."\n".$seq{$id}."\n"); close new_F; }

}

I'd really appreciate an explanation conceptually of what I need to do.

You should always put the following two pragmas at the top of your script.

Code

use strict; use warnings;

In this case, it will give you a very large number of errors regarding the declaration of two variables (%seq and $ID).

We can get rid of the %seq errors by changing %this_seq to %seq. (This may not be real solution, but at least it gives us a chance to work on the remaining errors.

All the remaining errors refer to a few lines. When you examine this area, you should discover that you did forget the 'my' on two 'foreach' loops.

Code

foreach my $id (keys %seq){ ... } for my $id (@seqarray){ ... }

Your code will now compile without errors, and may actually 'work'.

Next, notice that the variable $this_id is not used anywhere. Remove it. (You also must remove the $id from the return).

You probably are wondering how the data is passed. The subroutine HashSequences make its own copy of %seq. The return copies it back to the main program. The scope of %seq in the main program extends to the end of the file. The subroutine SpecifySeqLengths does not declare its own copy of %seq. The copy in main is still in scope, so the subroutine uses it.

This method of passing data is usually considered poor practice. It really is not to bad for only one variable in such a small program.

After you get this much working, I will suggest a number of improvements. Whether it works or not, please post a copy of your corrected code and a small sample of data so we can run it. Good Luck, Bill

for my $id (@seqarray){ if (-f $id){print $id, " already exists. It is about to be overwritten"}; open new_F, '>>', "SeqLength_$minlength-$maxlength", or die$!; print new_F ($id."\n".$seq{$id}."\n"); close new_F; }

The subroutine HashSequences would be much simpler if you could read your data sequence at a time rather than line at a time. This is probably possible, but I cannot tell without seeing a sample of your input data. (That is one of the reasons I asked for it.)

Allowing the subroutine SpecifySeqLengths to use the copy of %seq in the main program rarely causes any problem, but it can make errors much harder to find. The best practice is to always pass a reference to a data structure such as an array or hash.

Everything else remains the same. Now, the subroutine has its own copy of the hash. It cannot make changes to the copy in the main program.

You are using a very old style of open. New software should always use the three argument form where the second argument specifies whether the file is for input or output and whether any special processing is required. The third argument is just the file name.

You should always use "lexical filehandles". If you need to copy them or pass them to subroutine, they are just like any other scalar.

You should always consider using a prompt module (my favorite is IO::Prompt::Hooked for user input. In this case, you may consider it overkill, but that should be a conscience decision, not just a habit.

UPDATE: I forgot to mention that it is not necessary open and close the output file for every write.

Code

open my $new_F, '>>', "SeqLength_$minlength-$maxlength", or die$!; for my $id (@seqarray){ if (-f $id){warn "$id already exists. It is about to be overwritten"}; print $new_F ($id."\n".$seq{$id}."\n"); } close $new_F;

Your file test does not make any sense. I do not understand what you intend.