13. Perl Program to translate DNA into protein

- Introductory Workbook on Perl for Biology Students

The
objective of the program is to read the
DNA from a file or through a manual entry and convert the DNA string into a
protein string. For this purpose we first use a subroutine which translates
the three letter codon of DNA to the respective protein value. The subroutine
contains a hash variable called “%genetic_code”. This hash variable stores the protein
values with their respective DNA codon. So when the subroutine gets a codon,
the codon is converted to uppercase (as in line 3 with the command “uc”) and
then checked if the particular codon exists in this hash variable. If it exists
it returns the protein value else it prints the error occurred. So at the end
the subroutine returns either the protein value or the error.

We now
ask the user to choose 1 for entering the DNA manually or 2 for entering the
DNA through a file. If it is through manual method, the user enters the DNA
value on the screen and a chomp is done on the value to remove the enter
character and then stored in $dna. If it is through a file, the file is opened;
the contents of the file are read and stored in an array and the values of this
array then joined to form one complete DNA string and are stored in $dna.

We have
used a command called “my” in front
of some of the variables. This command symbolizes that the particular variable
is a private variable. Which means the variable cannot be inherited by any
another subroutine or program and is only applicable for that particular
subroutine or program.

Now we
initialize a variable $protein and $codon. We use the “for loop” to go through the DNA and get the three letter codon,
save it in $codon and pass the codon to the subroutine to get the respective
protein value. We then save this protein value in $protein. This way the for
loop scans the DNA string until it reaches to the end.

The for loop (for(my $i=0;
$i<(length($dna)-2); $i+=3)) initializes the variable $i to 0 and is limited until the last third
value of the DNA by using the second element in the for loop and incrementing
the value of $i by 3 for each loop. You then take the substring of the DNA by
using the command in line 102 (substr($dna,$i,3)). That is you take the substring of the DNA starting from
the “i” position with the length of 3. So we get the three letter codon from
the DNA string. This codon is then passed to the substring which is named after
“codon2aa”. The codon is converted to uppercase and then the if condition is
executed to see if a protein value exists with this codon and returns the
protein value which saved in $protein. The next time the for loop is execute
the protein value is concatenated with the existing value of the $protein and
saved again in $protein, therefore getting one complete protein sequence. We
can find out the length of the protein sequence by using the command called
length.