July 12, 2011

Graphical processing units (GPUs) are all the ragethesedays. Most journal issues would be incomplete if at least one article didn’t mention the word “GPUs”. Like any good geek, I was initially interested with the idea of using GPUs for statistical computing. However, last summer I messed about with GPUs and the sparkle was removed. After looking at a number of papers, it strikes me that reviewers are forgetting to ask basic questions when reviewing GPU papers.

For speed comparisons, do the authors compare a GPU with a multi-core CPU. In many papers, the comparison is with a single-core CPU. If a programmer can use CUDA, they can certainly code in pthreads or openMP. Take off a factor of eight when comparing to a multi-core CPU.

Since a GPU has (usually) been bought specifically for the purpose of the article, the CPU can be a few years older. So, take off a factor of two for each year of difference between a CPU and GPU.

I like programming with doubles. I don’t really want to think about single precision and all the difficulties that entails. However, many CUDA programs are compiled as single precision. Take off a factor of two for double precision.

When you use a GPU, you split the job in blocks of threads. The number of threads in each block depends on the type of problem under consideration and can have a massive speed impact on your problem. If your problem is something like matrix multiplication, where each thread multiplies two elements, then after a few test runs, it’s straightforward to come up with an optimal thread/block ratio. However, if each thread is a stochastic simulation, it now becomes very problem dependent. What could work for one model, could well be disastrous for another.

So in many GPU articles the speed comparisons could be reduced by a factor of 32!

Just to clarify, I’m not saying that GPUs have no future, rather, there has been some mis-selling of their potential usefulness in the (statistical) literature.

May 25, 2011

One of the podcasts I listen to each week is Security Now! Typically, this podcast has little statistical content, as its main focus is computer security, but episode 301 looks at how to generate truly random numbers for seeding pseudo random number generators.

Generating truly random numbers to be used as a seed, turns out to be rather tricky. For example, in the Netscape browser, the random seed used by version 1.0 of the SSL protocol combined the time of day and the process number to seed its random number generator. However, it turns out that the process number is usually a small subset of all possible ids, and so is fairly easy to guess.

Recent advances indicate that we can get “almost true” randomness by taking multiple snap shorts of the processor counter. Since the counter covers around 3 billion numbers each second, we can use the counter to create a true random seed.

To find out more, listen to the podcast. The discussion on random seeds begins mid-way through the podcast.

May 12, 2011

A Makefile is a simple text file that controls compilation of a target file. The key benefit of using Makefile is that it uses file time stamps to determine if a particular action is needed. In this post we discuss how to use a simple Makefile that compiles a tex file that contains a number of \include statements. The files referred to by the \include statements are Sweave files.

Suppose we have a master tex file called master.tex. In this file we have:
\include chapter1
\include chapter2
\include chapter3
....

where the files chapter1, chapter2, chapter3 are Sweave files. Ideally, when we compile master.tex, we only want to sweave if the time stamp of chapter1.tex is older than the time stamp of chapter1.Rnw. This conditional compiling is even more important when we have a number of sweave files.

Meta-rules

To avoid duplication in a Makefile, it’s handy to use meta-rules. These rules specify how to convert from one file format to another. For example,
.Rnw.tex:
R CMD Sweave $<

is a meta rule for converting an Rnw file to a tex file. In the above meta-rule, $< is the filename, i.e. chapter1.Rnw. Other helpful meta rules are:
.Rnw.R:
R CMD Stangle $<

which is used to convert between Rnw and R files. We will also have a meta-rule for converting from .tex to .pdf.

For meta-rules to work, we have to list all the file suffixes that we will convert between. This means we have to include the following line:
.SUFFIXES: .tex .pdf .Rnw .R

Files to convert

Suppose we have a master tex file called master.tex and a sweave file chapter1.Rnw. This means we need to convert from:

master.tex to master.pdf

chapter1.Rnw to chapter1.tex

chapter1.Rnw to chapter1.R

Obviously, we don’t want to write down every file we need – especially if we have more than one sweave file. Instead, we just want to state the master file and the Rnw files. There are a couple of ways of doing this, however, the following way combines flexibility and simplicity. We first define the master and Rnw files:

##Suppose we have three Sweave files with a single master file
MAIN = master
RNWINCLUDES = chapter1 chapter2 chapter3

Recommended Books

Pros: I quite like this book (hence the reason I put it on my list). It has a nice collection of exercises, it “looks nice” and doesn’t assume knowledge of programming. It also doesn’t assume (or try to teach) any statistics.

Cons: When describing for loops and functions the examples aren’t very statistical. For example, it uses Fibonacci sequences in the while loop section and the sieve of Eratosthenes for if statements.

I know graphics are important, but a whole book for an undergraduate student might be too much. I did toy with the idea of recommending this book, but I thought that five recommendations were more than sufficient.