Hello fellow netters,
I would like to generate a profile which can be used to test new sequences
for the occurence of a pleckstrin domain. I have tried to generate a
profile with the GCG program PROFILESEARCH, using the alignments given
in Musacchio et al., Trends Biochem. Sci. 18, 343-348, 1993. Since
the authors have used a different program which they claim to be
superior, I have started with aligning a few sequences according to
the paper and using these to find new members of the pleckstrin domain
family.
Unfortunately, I find that there is no really good way to discriminate
true positives from false hits.
I have started with the sequences marked in table I as members of the
first profile. The profilesearch of SwissProt gives the following result
(I have changed the output for brevity):
Orig. Sequence with description
20.99 SWISS:GTPA_HUMAN Ras GAP (member of original profile)
20.98 SWISS:GTPA_BOVIN Ras GAP
19.13 SWISS:GNRP_RAT Guanine nucleotide releasing protein (member of
original profile)
17.89 SWISS:OXYB_HUMAN Oxysterol-binding protein (member of original profile)
17.74 SWISS:OXYB_RABIT Oxysterol-binding protein
17.33 SWISS:DYN1_RAT Dynamin-1 (member of original profile)
17.18 SWISS:SOS_DROME Son of sevenless protein (member of original profile)
16.21 SWISS:P47_HUMAN Pleckstrin (member of original profile)
15.32 SWISS:DYN_DROME Dynamin = shibire protein
13.01 SWISS:SPCB_DROME Spectrin beta chain
12.57 SWISS:TYK2_HUMAN Non-receptor tyrosine-protein kinase tyk2
12.56 SWISS:PIP2_HUMAN PLC-gamma
12.34 SWISS:PIP2_BOVIN PLC-gamma
12.22 SWISS:ARK1_RAT beta-adrenergic receptor kinase 1
12.19 SWISS:ARK1_BOVIN beta-adrenergic receptor kinase 1
12.19 SWISS:ARK1_HUMAN beta-adrenergic receptor kinase 1
12.15 SWISS:RRPB_IBVB RNA-directed RNA polymerase (from a virus)
12.13 SWISS:PIP2_RAT PLC-gamma
12.12 SWISS:ACHG_CHICK Acetylcholine receptor, gamma chain precursor
12.06 SWISS:ARK2_RAT beta-adrenergic receptor kinase 2
12.05 SWISS:SDHD_ECOLI D-serine dehydratase
12.05 SWISS:PIP4_HUMAN PLC-IV
11.90 SWISS:PIP4_RAT PLC-IV
11.86 SWISS:ENV_SIVAT gp160 precursor
..
Two things confuse me:
a) All of the sequences which score obviously high are already member of
the profile or very related to a member of the profile.
b) While most of the top-scoring sequences are considered by Musacchio et al.
to be a member of the pleckstrin domain-containing group of proteins,
there are some which do not, e.g. Serine dehydratase from E. coli or
the RNA polymerase. How can these proteins for sure be excluded from
the pleckstrin family?
--Cornelius.
--
/* Cornelius Krasel, Abt. Lohse, Genzentrum, D-82152 Martinsried, Germany */
/* email: krasel at alf.biochem.mpg.de fax: +49 89 8578 3795 */
/* "People are DNA's way of making more DNA." (Edward O. Wilson, 1975) */