3/18/15 Annotated bibliographies and project summaries.

Copy this text to your wiki and fill in these statements to outline your project.

My general topic of interest:

My research question:

The data I will use:

Where I will find my data:

The approach/method/tool I will use:

My goal- I hope I create a figure/table that looks like this:

I will need help in this area:

Annotated bibliography is due today. You need to know what you’re looking for and why this might be interesting! Deeksha and Lasya's wiki has a great example of what you should be doing for annotated bibliography.

3/17/15: Kristen Wade is a TA for the second half of the semester and she is here to help! If you want to contact her, you can email her at wadekj@vcu.edu! Her lab is Trani 311.

Phrodo was sequenced by Illumina MiSeq next-generation sequencing. We have almost 1,000,000 sequencing reads, each read is 150 nucleotides. I took 100,000 of those reads (~10%) and assembled them into a contig using Newbler. We can view that assembly with the software Consed. The 'assembly view' plot of read coverage (y-axis) vs. genome length (x-axis) looks like this:

We see read coverage is consistent across the genome, except for a small plateau where read coverage is higher.

We looked last Wednesday to find the boundaries of the plateau. These are suspect genome ends, and the doubling of read coverage suggests we should see a "wall" where a lot of reads start or end at the same position. We found this wall at position 13758 (note red * is a fake space, so the reads actually align with the first G at 13758, not 13757).

And the other side of the "wall" at 16161:

You can check read coverage under the Misc tab, Depth of Coverage at Cursor tool.

Genome position

Read depth

13756

99

13757

286

16161

195

16162

117

When we export the final consensus genome, we will export the sequence with the terminal repeat (13578-16161) at both ends of the genome. The diagram below is a schematic of what the genome will look like, where the red rectangle is the terminal repeat sequence.

We found two weird regions in the genome, but didn't make any changes to the final genome consensus sequence:

At position 147018, we found a SNP with 2289 reads containing a T and 1450 reads containing a G. The software calls a T in consensus and we will keep that as the majority nucleotide. Note to check position 133261 (147018-13757) during annotation to see if this SNP changes an amino acid in a protein.

A region around 61350 has a bunch of reads (length 60 nucleotides?) that are not incorporated into assembly, so we are ignoring them. There are many reads of good sequence. The sequence not incorporated into assembly isn't found anywhere else in the genome.

The GC content is 37.8%.

The winners of the paper DNA assembly exercise were Damian, Jordan, and Emma!

January 26th: Annotation of the first gene.

Original Glimmer call @bp 1184 has strength 15.18SSC:CP: includes all coding potentialSD: SD score is 735, and is the highestSCS: predicted by G and GMGAP: first gene, no 5' geneBLAST: 1:1 match to gp61 of phage B4LO: longest open reading frameST:F: