Wiki

Function

Back-translate a protein sequence to a nucleotide sequence

Description

backtranseq reads a protein sequence and writes the nucleic acid sequence it is most likely to have come from.

Algorithm

backtranseq uses a codon usage table which gives the frequency of usage of each codon for each amino acid. For each amino acid in the input sequence, the corresponding most frequently occuring codon is used in the nucleic acid sequence that is output.

Usage

Here is a sample session with backtranseq

Note that this is a human protein and so the default human codon frequency file is used ie. is not specified

Data files

The codon usage table is read by default from "Ehum.cut" in the 'data/CODONS'
directory of the EMBOSS distribution. If the name of a codon usage file
is specified on the command line, then this file will first be searched
for in the current directory and then in the 'data/CODONS' directory of
the EMBOSS distribution.

EMBOSS data files are distributed with the application and stored
in the standard EMBOSS data directory, which is defined
by the EMBOSS environment variable EMBOSS_DATA.

To see the available EMBOSS data files, run:

% embossdata -showall

To fetch one of the data files (for example 'Exxx.dat') into your
current directory for you to inspect or modify, run:

% embossdata -fetch -file Exxx.dat

Users can provide their own data files in their own directories.
Project specific files can be put in the current directory, or for
tidier directory listings in a subdirectory called
".embossdata". Files for all EMBOSS runs can be put in the user's home
directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

. (your current directory)

.embossdata (under your current directory)

~/ (your home directory)

~/.embossdata

Notes

backtranseq reads a data file containing the codon usage table. The default file is Ehum.cut - the human codon usage table. Many others are available and can be set by name with the -cfile qualifier. It is important to use one that is appropriate for the species that your protein comes from. The specified data file must exist in the EMBOSS data directory (see below for more information).