Posts Categorized / Code

Since I do a lot of single cell RNA-seq analysis, the data format I typically work with are matrices of cells by gene names with values of counts or FPKMs. Genes IDs depend on the gtf file annotation and are often either HUGO or ENSEMBL ID. ENSEMBL IDs are hard to interpret, so I often

Prevent losing your work due to broken connections using screen. Especially great when working from an unstable connection. For installation and other information, see: http://www.gnu.org/software/screen/ To use screen: ssh into your remote server as usual. Actual using the command screen. You will see an introduction. Just click enter. Now you are using screen! Easy as

10 useful command line editing commands: C is Control. M is Meta or Option on Mac (assuming you have gone to Preferences -> Settings -> Keyboard and turned on “Use option as meta key”). Move to the start of the line: C-a Move to the end of the line: C-e Move forward a word: M-f

This R function will allow you to use custom distance functions (other than Euclidean, etc) to create a distance matrix. Given a list and a custom distance function, a matrix containing pairwise distances, as specified by the function, of all elements of the list will be returned. code: custom.dist

This R function will return a list of the 10 largest objects stored in memory. This function is particularly useful in combination with rm() and gc() to remove large objects that are no longer needed to free up space in memory. code list.top.obj.in.mem

Consider the following multiple sequence alignment: >HUMAN —————–METTNGT-ETWYESLHAVLKALNATLHSNLLCRPGPGL–G– >MOUSE —————–METSNGT-ETWYMSLHAVLKALNTTLHSHLLCRPGPGP–G– >CHICKEN —————–MEEDNRT-EPWHHSLQAMLDALNQTLHRAILHP-ST——- … Let’s say you want to remove the columns from the alignment where the query (human) protein sequence has a gap. You may want to do this for a number of reasons though I had to do this trick to get my local