]]>http://www.ngcrawford.com/2012/07/31/filtering-contigschromosomes-from-a-multi-fasta-file/feed/2Python: Adding Read Group (@RG) tags to BAM or SAM fileshttp://www.ngcrawford.com/2012/04/17/python-adding-read-group-rg-tags-to-bam-or-sam-files/
http://www.ngcrawford.com/2012/04/17/python-adding-read-group-rg-tags-to-bam-or-sam-files/#commentsTue, 17 Apr 2012 16:44:11 +0000http://www.ngcrawford.com/?p=506[…]]]>The SAM specification now requires @RG tags to be included in all SAM/BAM alignments. If you are using GATK you have probably noticed that it will not run without them. Since @RG tags weren’t standard until recently, I’ve written a script to add them in post hoc. You’ll need to install pysam and python2.7 to get it to work.

]]>http://www.ngcrawford.com/2012/04/17/python-adding-read-group-rg-tags-to-bam-or-sam-files/feed/1Bowtie2 output as BAMhttp://www.ngcrawford.com/2012/03/14/bowtie2-output-as-bam/
http://www.ngcrawford.com/2012/03/14/bowtie2-output-as-bam/#commentsWed, 14 Mar 2012 14:01:46 +0000http://www.ngcrawford.com/?p=442[…]]]>Bowtie2 is a short read aligner that is optimized for aligning longer reads of lengths of 50 bp or greater. I’ve been playing around with it and was initially puzzled by the fact that it only outputs SAM formated alignments. Then I realized you can pipe the output straight into samtools which will do the compression to BAM for you.

]]>http://www.ngcrawford.com/2012/03/14/bowtie2-output-as-bam/feed/5MultiMarkdownhttp://www.ngcrawford.com/2010/06/09/multimarkdown/
http://www.ngcrawford.com/2010/06/09/multimarkdown/#respondWed, 09 Jun 2010 15:56:58 +0000http://www.ngcrawford.com/?p=336[…]]]>Since I started using github in a serious way back in January I’ve begun writing my documentation in the markdown format that displays so nicely on github. Markdown is essentially a parsing tool and a simple text syntax that allows the easy conversion of human ‘readable text’ to html. It’s intuitive, it took less than 5 minutes to pick up, and saves me a ton of time not writing HTML. However, its ease of use is tempered, a bit, by a lack of features. Although it is easy to create headers, lists, and code bocks – simple HTML stuff – it doesn’t include the option to create tables, formated mathematical formulas, citations and bibliographies. Since I’m a scientist who wants to produce documents with these sorts of features, this is annoying.

Luckily, the markdown syntax has recently been extended, in a project called MultiMarkdown, to include many of the aforementioned features. Multimarkdown essentially merges the markdown syntax with LaTeX which, if you haven’t heard of it, is a rather inscrutable, but extremely powerful text formatting language. It’s popular in the CS and physics disciplines. LaTeX produces beautiful documents, but it’s easy to spend a week or more adjusting the formatting and reading the API trying to figure out some of the more complicated features. Multimarkdown looks like it will do much of the more basic LaTeX formatting, but without the headache.

]]>http://www.ngcrawford.com/2009/11/18/academia-vs-business-via-xkcd/feed/0F$@%ing R: Adventures with Tcltk in OSXhttp://www.ngcrawford.com/2009/10/28/fing-r-adventures-with-tcltk/
http://www.ngcrawford.com/2009/10/28/fing-r-adventures-with-tcltk/#commentsWed, 28 Oct 2009 22:04:01 +0000http://www.ngcrawford.com/?p=291[…]]]>I’ve got a bunch of RNA-seq reads I need to analyze and for the the most part I’ve been writing my own code to do the analysis. However, a recent paper in BioInformatics (Wang et al. 2009) describes a new R package for the identification of differentially expressed genes in RNA-seq datasets. R is a pretty straightforward language with a built-in installation system so I should just have to type two lines of code…

source("http://bioconductor.org/biocLite.R")
biocLite("DEGseq")

Not so quick. When I ran this code R tells me it can’t find the DEGseq library. A bit more poking around on the internets and I discover that there’s an alternate download site:

But after installing some dependancies it also spits out a bunch of errors. I compare the errors… Hmmm… Both installs appear to by dying on the tcl/tk install, but tcltk is a default R library. I can see it right there in “/Library/Frameworks/R.framework/Resources/library”. Two hours later and after trying a bunch of crap I find this helpful website: