Hi
I want to fetch sequence from soybean genome, according to a gff file.
My
gff3 file and genome file are attached to the email, because it is not
easy
to recongnize the format if I paste it in the email. And it keeps
reporting the error:
An error occurred running this job: Traceback (most recent call last):
File
"/galaxy/home/g2main/galaxy_main/tools/extract/extract_genomic_dna.py"
,
line 288, in <module>
if __name__ == "__main__": __main__()
File
"/galaxy/home/g2main/galaxy_main/tools/extract/extract_genomic_dna.py"
Could you please tell me where is the problem?
Best
Qianli

Hello Qianli,
This appears to be the same data as submitted as a recent bug?
Converting the query coordinates to BED format is still the
recommendation. This should be a good solution for most, if not all,
of
your prior Extract tool failures and is a good method overall.
First "Convert Formats -> GFF-to-BED", followed by clicking on the
pencil icon to assign the last three columns on the "Edit Attributes"
form, in particular you will want to get strand assigned, so that c4 =
name, c5 = score, and c6 = strand. The datatype will be bed. Then
extract using your custom genome and the sequence will be titled by
the
region coordinates.
Best,
Jen
Galaxy team
--
Jennifer Jackson
http://galaxyproject.org

Hello,
Yes, MEME is not on the Main server, but can be used in local, cloud,
or
slipstream Galaxy installs. For throughput - there are a few
MEME-related repositories in the Tool Shed to choose from. How many
sequences each can process will likely vary and are related to the
hardware the Galaxy instance is run on. Contacting the tool authors is
one path or you can try testing using your actual data. Data
composition
could be an important factor (not just the number of sequences).
For the Sequence Logo Generator, I do not know of a hard limit by the
tool itself, but as the recommended/supported input is ClustalW
output,
that tool will most likely be setting the potential upper limit when
using the public Main Galaxy instance. Testing will be the best way to
learn the limits for your particular data (whether nucleotide or
protein), but the success range will be capped in the thousands, not
millions (and possibly lower, as length increases). If there is a
memory
problem or the run time exceeds the limits on the public server, the
job
will end with an error. Moving to a scaled up server, such as a cloud
Galaxy, will give you more control over these types of variables.
Some benchmarks are in the Clustal W publication:
http://bioinformatics.oxfordjournals.org/content/23/21/2947.long
If others would like to post benchmarks from their own experience,
that
would be welcome!
Jen
Galaxy team
--
Jennifer Hillman-Jackson
http://galaxyproject.org