Hello,
My account login is:
johnsonko@ninds.nih.gov<mailto:johnsonko@ninds.nih.gov>
I am a first time Galaxy user.
I have uploaded my sequences as format "fastq" into Galaxy and would
like to next use "Groomer" to output Sanger fastq format so to go on
with exploring quality via box plot, deciding on a trim length (if
any), and map to genome using bwa or bowtie.
However, I am running into a problem using "Groomer".
I do not know what format my sequences are per setting the required
input parameter.
An example of my sequences is as follows:
@SNPSTER6_0679:1:1:1083:939#0/1 run=100908_SNPSTER6_0679_70929AAXX
NATTTATGGATAGTTGGGTAGTAGGTGTAAATGTATGTGGTAAAAGGCCTAGGAGATTTGTTGATCCAAT
AAATATGATTAGGGAAACAA
+SNPSTER6_0679:1:1:1083:939#0/1
BIQQIQQQTP[[[[[VVVVQPPPPPTWWWW[[YYTTTOVV____TWVXRWPTQPQWWWWWTOOVV___V_
TROOWTWTWTQWQWTTRWRO
... how to tell if you have: "Sanger", "Solexa", "Illumina 1.3+", etc.
I have tried to submit to "Groomer" different times using these
options one at a time and none return with results.
Need help please.
Also, what is the expected time for "Groomer" to return results for a
file containing 2.7 million reads.
Thank you ... best,
Kory
Kory R. Johnson, MS, PhD
Sr. Bioinformatics Scientist
[cid:image001.jpg@01CBC2E0.B2CEC7F0]
www.kellygovernmentsolutions.com
Providing Contract Services For:
Bioinformatics Section,
Information Technology & Bioinformatics Program,
Division of Intramural Research (DIR),
National Institute of Neurological Disorders & Stroke (NINDS),
National Institutes of Health (NIH),
Bethesda, Maryland
Mailing Address:
NINDS/NIH
Clinical Center (Building 10)
Office 5S223
9000 Rockville Pike
Bethesda, MD 20892
Contact Information:
Phone: 301-402-1956
Fax: 301-480-3563
email: johnsonko@ninds.nih.gov
P Green Message:
Please consider the environment before printing this e-mail. Thank
you.
Important Message:
This electronic message transmission contains information intended for
the recipient only. Such that, the information contained herein may
be confidential, privaledged, or proprietary. If you are not the
intended recipient, be aware that any disclosure, copying,
distribution, or use of this information is strictly prohibited. If
you have received this electronic information in error, please notify
the sender immediately by telephone. Thank you.

Hi Kory,
The problem with this FASTQ block is that the sequence and quality
score identifier lines do not match ('SNPSTER6_0679:1:1:1083:939#0/1
run=100908_SNPSTER6_0679_70929AAXX' vs
'SNPSTER6_0679:1:1:1083:939#0/1'), where the identifier for the
sequence line has additional text not found on the identifier for the
quality score line, which is not valid for the FASTQ format.
Alternatively the quality score identifier line could be only a '+',
without the sequence identifier.
The quality score lines appear to be either illumina or solexa, but it
is best to check with the source of the data to be sure:
Input ASCII range: 'B'(66) - '_'(95)
Input decimal range: 2 - 31
You'll need to upload valid FASTQ files inorder to work with them in
Galaxy. Correct examples of your provided read are:
@SNPSTER6_0679:1:1:1083:939#0/1
NATTTATGGATAGTTGGGTAGTAGGTGTAAATGTATGTGGTAAAAGGCCTAGGAGATTTGTTGATCCAAT
AAATATGATTAGGGAAACAA
+SNPSTER6_0679:1:1:1083:939#0/1
BIQQIQQQTP[[[[[VVVVQPPPPPTWWWW[[YYTTTOVV____TWVXRWPTQPQWWWWWTOOVV___V_
TROOWTWTWTQWQWTTRWRO
or
@SNPSTER6_0679:1:1:1083:939#0/1 run=100908_SNPSTER6_0679_70929AAXX
NATTTATGGATAGTTGGGTAGTAGGTGTAAATGTATGTGGTAAAAGGCCTAGGAGATTTGTTGATCCAAT
AAATATGATTAGGGAAACAA
+
BIQQIQQQTP[[[[[VVVVQPPPPPTWWWW[[YYTTTOVV____TWVXRWPTQPQWWWWWTOOVV___V_
TROOWTWTWTQWQWTTRWRO
or
@SNPSTER6_0679:1:1:1083:939#0/1 run=100908_SNPSTER6_0679_70929AAXX
NATTTATGGATAGTTGGGTAGTAGGTGTAAATGTATGTGGTAAAAGGCCTAGGAGATTTGTTGATCCAAT
AAATATGATTAGGGAAACAA
+SNPSTER6_0679:1:1:1083:939#0/1 run=100908_SNPSTER6_0679_70929AAXX
BIQQIQQQTP[[[[[VVVVQPPPPPTWWWW[[YYTTTOVV____TWVXRWPTQPQWWWWWTOOVV___V_
TROOWTWTWTQWQWTTRWRO
Please let us know if we can be of further assistance.
Thanks for using Galaxy,
Dan