Transcription factor software (Steve Thompson's concerns)

In <920520165309.20200935 at BOBCAT.CSC.WSU.EDU> THOMPSON at WSUVMS1.CSC.WSU.EDU writes:
> Fellow Netlanders--
>> In reply to a discussion on software for accessing Dr. Gosh's Transcription
> Factor Database Michael Weise writes:
>> {text deleted}
> >
> > The TFD is available in the file SITEDATA.GCG (available via anon.
> >ftp from ncbi.nlm.nih.gov in /repository/TFD/datasets). Feedback from tech
> >
> {much stuff deleted}
>> >necessary to first create a TFD.Patterns file (with a format like that in
> >GCG's Prosite.Patterns ) and a set of .TFdoc files using the information
> >found in SITEDATA.GCG (while the GCG package has a TFsites.DAT file, it
> >doesn't contain all the information found in SITEDATA.GCG). In creating
>> {more stuff deleted}
>> However, when I ftp'ed SITEDATA.GCG over and compared it to our own GCG version
> of TFsites.DAT I didn't recognize any differences.
{ results of VMS dif deleted }
>> Yet Micheal claims that TFsites.DAT doesn't have as much information as
> SITEDATA.GCG. What's going on? Might it be that Micheal's version of
> TFsites.DAT is not current? Regardless, Thank's for the tips; we will pursue
> the modifications and use MOTIFS as Micheal suggests.
>> Steve Thompson
> Steven M. Thompson
> Consultant in Molecular Genetics and Sequence Analysis
> VADMS (Visualization, Analysis & Design in the Molecular Sciences) Laboratory
> Washington State University, Pullman, WA 99164-1224, USA
> AT&Tnet: (509) 335-0533 or 335-3179 FAX: (509) 335-0540
> BITnet: THOMPSON at WSUVMS1 or STEVET at WSUVM1> INTERnet: THOMPSON at wsuvms1.csc.wsu.edu
Well, when we setup TF_Motifs, we compared the GCG v.7 file TFsites.DAT to
SITEDATA.GCG and found them to be different (the SITEDATA file contained names
of transcription factors associated with sites, whereas the .DAT file didn't).
In GCG v. 7.1, the two files are identical in content, so Steve is correct in
what he sees with his dif. However, just having an updated TFsites.DAT does
not provide the capability of using Motifs to analyze for TF sites in NT
sequences. It is still necessary to have our program read the info in this
file and create the .PATTERNS file and set of .TFdoc files.
Sorry if this has caused problems. My todo list DOES have an upgrade to 7.1 as
an item; it's just been difficult getting down to it.
MJW
PS. The fun part of all this is that Help at GCG.COM told me that Motifs wouldn't
be able to analyze NT sequences (first info that I've gotten from them which
missed the mark). What makes it work is: 1) A,T,G,C are valid symbols in both
NT and AA alphabets, and 2) you can expand TFsite.DAT patterns [ex, GGAKGA], so
they don't contain any ambigous NTs, and thus make them look like Prosite
patterns [ex, GGA(G,T)GA ] which Motifs can readily use (this is one of the
things our program does). In that way, Motifs doesn't know - or care - that
it's working with an NT seq instead of an AA seq.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/ Michael J. Weise, Ph.D. \ Univ.of Ga. BioScience Computing Facility \
( weise at bscf.uga.edu \ Dept.of Genetics UGa, Athens GA 30602 )
\ _ _ _'Tis_only_me_speak'n._ _\_ _ _ _ _ _ _ (706) 542-1409_ _ _ _ _ _ _ /