Stefan Kirov wrote:
> Sendu Bala wrote:
>>> I'm looking to extract data from some Transcription Factor Binding
>> Site (TFBS) databases. For example, matrix, sequence and known
>> position information out of Transfac flatfiles.
>>>> Currently there is Bio::Matrix::PSM::IO::transfac, but it only gives
>> you the PSM matrices, not the 'instance' sequences. Bio::Matrix::PSM
>> also has this to say:
>> Transfac is not an open database so, you cannot get the instance data
> anyway.
You can. It is in the sites.dat file and often in the matrix.dat file.
It is also available freely and publicly via at least 2 websites.
> There was a discussion on that recently. Since Bioperl is
> completely open project, I am not sure it makes sense to put efforts
> into supporting something that is not open- even if you have access to
> the data files (which I believe Transfac does not allow in general)
It does allow it; you just have to pay for fast access to the latest
data. Or you can use older data for free via the web. A Bio::DB module
could provide access to either.
> how the rest of us can use it or debug/support it?
It may be possible to include a small example subset of the data in
t/data; there is after all already t/data/transfac.dat (which is a small
matrix.dat file).
In any case, I don't see that your argument is valid. Why should bioperl
be restricted to only dealing with 'open' data sources? If someone is
willing to develop and maintain a module that deals with a data source,
it makes no difference if that source is open or not - it is useful
either way to other people who also have access to that data. If there
comes a time that the maintainer can no longer maintain it and it stops
working because the data format changes, and no one knows the new
format, it can be deprecated.
Is there some 'popularity' threshold that must be passed before it is
'worth' adding a database module to Bioperl? Why should there be one?
The cost of having one is a few kb in disc storage space, the benefit
extremely large to the person who might want to use it. There may be an
argument that core shouldn't become cluttered with too much stuff that
the majority of people won't use, but how is that line drawn? I don't
personally use the majority of bioperl modules, but I don't think they
should all be removed. And clearly the idea of having PWM, transfac
related modules in bioperl has been deemed acceptable in the past, or we
wouldn't have Bio::Matrix::PSM::transfac.