Bioinformatics : RNA : Neurodegeneration

We have been thinking about using HDF5 files to store gene expression tables. Compared to flat files, they offer the significant advantage of constant time element and slice access. Compared to traditional PDLs they are not limited by 32-bit architecture (so larger tables can be stored). Compared to MySQL tables they are not limited by issues with large numbers of columns. So they seem like they will be a good solution. My colleagues are working on an R interface so I wanted to check out the existing Perl interface, and see if using it one can actually achieve the following design goals Continue reading →

It is easy to store fixed-length sequences like ordered pairs or triples in a relational database—just define one column for each element of the sequence. Then the rows of your table correspond to your tuples. However, when the number of components is variable, like storing the exons of a transcript, it is a bit more complicated. What are the different options for storing this type of data, and how do they compare to each other in terms of storage requirements and access times for different tasks? Continue reading →

I’m trying to get Yarden Katz’s program MISO running on the Harvard FAS computing cluster. It is not so easy to satisfy all the dependencies.

Some of the dependencies, including pygsl, scipy and numpy, are already available through add-on modules. I found that the problem is that they are not all the right versions and I get errors such as the following:ValueError: can't handle version -1 of numpy.dtype pickleImportError: cannot import name defaultdictValueError: numpy.dtype does not appear to be the correct type object