Hey there,
I just noticed Ensembl added selenocysteins in the latest release, do
you how how they modelled them internally?
cheers,
Elia
On 19 Feb 2005, at 02:21, Heikki Lehvaslaiho wrote:
> Albert,
>> I refreshed my memory (with help from Tamara Kulikova @ EBI) how
> selenocystein
> and other exceptions are handled in EMBL/Genbank:
>> I am afraid it is mess - partly because the awareness of these cases
> is quite
> recent and partly because the biology itself is messy.
>> You really need to extract the whole CDS feature from the feature
> table to and
> look for the following three qualifiers:
>> 1. transl_exception
>http://www.ebi.ac.uk/embl/WebFeat/qualifiers/transl_except.html>> which tells you in entry coordinates where the exception is. If the
> amino
> acid is not one of the known ones with an abbreviation, it is named
> "OTHER",
> and there is a note qualifier witht the correct name.
>>> 2. codon
>http://www.ebi.ac.uk/embl/WebFeat/qualifiers/codon.html>> All these codons in this CDS is translated to the stated amino acid
>> 3. exception
>http://www.ebi.ac.uk/embl/WebFeat/qualifiers/exception.html>> If RNA aediting messes up translation so badly that previous
> qualifiers are
> not enough, you can state that replace this range with these amino
> acids.
>>> (one-letter codes used in the translation are here:
>http://www.ebi.ac.uk/embl/Documentation/FT_definitions/> feature_table.html#7.5.3)
>>>>> The bottom line is, we should not touch the current translation
> implementation
> in Bioperl. If you want to have a go at incorporating alternative
> translations that implement some of the above or the hack I suggested
> earlier, please put them into Bio::SeqUtils.
>> Why do not you try your hand in writing a translation function that
> takes an
> Bio::RichSeq object from the Bio:SeqIO::[embl|genebank] parser as an
> argument
> and extracts the CDS (by name/id/order or all of them) and checks for
> exceptions AND tries to take them into account, and outputs the
> translation
> sequence object! At the same time it should check for the transl_table
> qualifier and use that to call up the right one.
>> Like you said there should be code that can be reused in Ensembl.
>>> -Heikki
>>>>>> On Friday 18 February 2005 14:02, Albert Vilella wrote:
>> On Fri, 2005-02-18 at 11:28 +0000, Heikki Lehvaslaiho wrote:
>>> Albert,
>>>>>> The best way to deal with this would be to have genetic code that
>>> correctly translates into selenocysteine. Unfortunately I could not
>>> find
>>> anything on the topic on Taxonomy Genetic codes home page:
>>> <http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi>.
>>> I guess I should ask around if there are plans to deal with this.
>>> Are those CDSs from EMBL or Genbank? If so, could send me a few
>>> accession
>>> numbers to check.
>>>> from Genbank:
>>>>http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=57016379>>>>> The translate method has already too many optional arguments, so
>>> rather
>>> not put in any more solely for dealing with celenocysteine.
>>>> True.
>>>>> Could you put together (and send to me) data lines for @NAMES,
>>> @TABLES
>>> and @STARTS in Bio::Tools::CodonTables and call it tentatively
>>> "Standard
>>> with celenocystein" and use id 20 which has been merged with existing
>>> codes and not currently in use. That should provide a working code
>>> for
>>> your purposes while I try to find a consensus on this.
>>>> I have added a "Standard with selenocysteine" in 20.
>> I have also added a "Bacterial with selenocysteine" in 19.
>>>> Now is not apparent that 20 and 19 are only for in-frame TGAs, not
>> codon
>> stops in CDSs.
>>>> I've seen an email from Ewan in 2004-July bioperl-ml that they solved
>> that problem in ensembl, but I haven't found how they did it in their
>> code:
>>>>http://portal.open-bio.org/pipermail/bioperl-l/2004-July/016363.html>>>> Albert.
>>>> **************
>>>> @NAMES = #id
>> (
>> 'Standard', #1
>> 'Vertebrate Mitochondrial',#2
>> 'Yeast Mitochondrial',# 3
>> 'Mold, Protozoan, and CoelenterateMitochondrial and
>> Mycoplasma/Spiroplasma',#4
>> 'Invertebrate Mitochondrial',#5
>> 'Ciliate, Dasycladacean and Hexamita Nuclear',# 6
>> '', '',
>> 'Echinoderm Mitochondrial',#9
>> 'Euplotid Nuclear',#10
>> '"Bacterial"',# 11
>> 'Alternative Yeast Nuclear',# 12
>> 'Ascidian Mitochondrial',# 13
>> 'Flatworm Mitochondrial',# 14
>> 'Blepharisma Nuclear',# 15
>> 'Chlorophycean Mitochondrial',# 16
>> '', '', '',
>> 'Bacterial with selenocystein', # 19
>> 'Standard with selenocystein', # 20
>> 'Trematode Mitochondrial',# 21
>> 'Scenedesmus obliquus Mitochondrial', #22
>> 'Thraustochytrium Mitochondrial' #23
>> );
>>>> @TABLES =
>> qw(
>> FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG
>> FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG
>> FFLLSSSSYYQQCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> '' ''
>> FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG
>> FFLLSSSSYY**CCCWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> FFLLSSSSYY**CC*WLLLSPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG
>> FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG
>> FFLLSSSSYY*QCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> FFLLSSSSYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> '' ''
>> FFLLSSSSYY**CCUWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> FFLLSSSSYY**CCUWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNNKSSSSVVVVAAAADDEEGGGG
>> FFLLSS*SYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> FF*LSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>> );
>>>>>> @STARTS =
>> qw(
>> ---M---------------M---------------M----------------------------
>> --------------------------------MMMM---------------M------------
>> ----------------------------------MM----------------------------
>> --MM---------------M------------MMMM---------------M------------
>> ---M----------------------------MMMM---------------M------------
>> -----------------------------------M----------------------------
>> '' ''
>> -----------------------------------M----------------------------
>> -----------------------------------M----------------------------
>> ---M---------------M------------MMMM---------------M------------
>> -------------------M---------------M----------------------------
>> -----------------------------------M----------------------------
>> -----------------------------------M----------------------------
>> -----------------------------------M----------------------------
>> -----------------------------------M----------------------------
>> '' ''
>> ---M---------------M------------MMMM---------------M------------
>> ---M---------------M---------------M----------------------------
>> -----------------------------------M---------------M------------
>> -----------------------------------M----------------------------
>> --------------------------------M--M---------------M------------
>> );
>>>> **************
>> --
> ______ _/ _/_____________________________________________________
> _/ _/ http://www.ebi.ac.uk/mutations/> _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk
> _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
> _/ _/ _/ Wellcome Trust Genome Campus, Hinxton
> _/ _/ _/ Cambridge, CB10 1SD, United Kingdom
> _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org>http://portal.open-bio.org/mailman/listinfo/bioperl-l