I certainly know of no other scientific paper that mentions the Raspberry Pi.

Yours has certainly rekindled my interest in cell biology and Perl!

However, it would be good to point out that the downloaded image file very nearly fills a true 32 Gbyte SD card. Some nominal 32 Gbyte cards deliver only about 30 Gbytes when formatted and cannot hold the image.

The "Intenso" brand SDHC card, nominally 32 GB, that I bought from Maplin last week formats to just under 30 GB.

I don't think that the formatted size is manufacturer-related. The shortfall is probably within the manufacturing spread of each card. (If you were using the card for its intended purpose in a video camera, you probably wouldn't notice.)

This card also comes in at 29.2 gb and thus not usable. Another user in waiting line for the slimmed down update.
Sony 32 GB Secure Digital High Capacity (SDHC/SDXC)- Class 10 /UHS-I 40 MBps Read - SF32UY/TQMN

@jardino: Thanks Alan. The idea had occurred to me; however the output from blastall that you showed seemed to indicate that it could be quite complicated. I shall take a look and see if I can figure it out. If not, I might be back asking for detailed instructions.

It is data about biological cells, proteins, amino acids and genomes - human and other. The developing science of bioinformatics analyses this data for medical and other purposes.

The University of St. Andrews is using RPis rather than traditional computer networks for their bioinformatics course (see original post).

While you're in here, I have an RPI question:

Is there any possible way to use raspi-config (or something similar) to expand the root file system to some proportion of the size of the SD card?

Our problem herein is that the dataset we need seems to be bigger than 16 GB, but somewhat less than 30 GB (I've not analysed it in detail yet). Some nominal 32 GB cards seem to format to as little as 28 or 29 GB, so an image created on a "big 'un" won't fit onto a "little 'un".

So it would be good if the root file system on a new installation on a 32 GB card could be expanded to, say, 25 GB - or whatever the dataset needs, plus a margin.

Any help offered would be appreciated.

(By the way, I'm not associated with the University - I just like tasking RPis!)

It's fun, and mildly amazing, that it fits on an SD card at all. In practice, due to its size, it really takes too long to search on the Pi. The download also gets larger as more sequencing gets done. (SwissProt - a high-quality subset of the protein sequence database - will remain in 4273pi and is no problem.)

I've ordered the 'problematic' Sony SD card inder mentioned. I will test the next release of 4273pi on that; on the SanDisk card which we use extensively and is known to work (http://eggg.st-andrews.ac.uk/files/2013 ... 4273pi.pdf); and the 'problematic' Intenso SD card. I put 'problematic' in quotes because I'm sure these are fine SD cards in general.

Finally, the work instruction ... I'm glad this is proving some use.

- As Alan said, it includes some hoops so I can use a static IP suited to my part of the University of St Andrews, yet set the card up to also work with DHCP. If you only want DHCP, the static IP stuff is not useful or important.

- I would suggest to use a large swapfile (script_3.sh). Some sources suggest this can eventually damage the SD card, but I'm not sure this is likely. More important, it's not nice to run out of address space.

The NCBI's 'nr' BLAST database is split into several physical files to make it more manageable. But it is still one single BLAST database, as far as the user is concerned. When searching 'nr', BLAST 'knows' to look in all of these files.

I don't think there's any biological rationale behind the assignment of sequences to the component files. Omitting some of the files is only OK if you never use the 'nr' database. It doesn't make sense to omit some files if you do use it. An arbitrary set of sequences will be unavailable and no warning will be issued. To avoid confusion, it might be safer to omit the 'nr' database entirely. I think if you leave all the 'nr' files in gzipped form, then I think BLAST will not use them. (This would be worth checking. If blastall doesn't find 'nr', it should issue an error for any search with '-d nr'.)

For the 4273 Bioinformatics for Biologists course as it currently stands, all of the 'nr' files are required, just because students are asked to search 'nr' in an 'own-time' exercise in the Week 1 practical (practical_linux_perl.pdf, p. 5). However, this task isn't very central to the course and could be omitted. This would be my suggestion.

An alternative, but more difficult and time-consuming exercise would be to install 'nr' on a USB stick before performing this search. This would involve unzipping the 'nr' files into a directory on the USB stick and making sure BLAST can find them, for example by modifying the BLASTDB environment variable.

- The Open Access course, 4273pi Bioinformatics for Biologists, is now arranged into separate 'components'. This makes it far easier to create your own short course or integrate components with other teaching material.