Monday, November 22, 2010

Over the summer I was asked if I would do an interview to illustrate what University of Southampton graduates can end up doing. The discussion I had with Karen was great fun, and the resulting profile seems to have turned out ok. I always find these thing hard. It's one thing writing a technical document, but quite another to collaborate on something more personal. The family joke is that the acknowledgments was the hardest section of my books for me to write.

Whilst I'm talking about university life, I was surprised to find my PhD thesis listed (but unavailable) at Amazon.co.uk.

Saturday, November 13, 2010

It was an exciting morning - my copy of Multicore Application Programming was delivered. After reading the text countless times, it's great to actually see it as a finished article. It starting to become generally available. Amazon lists it as being available on Wednesday, although the Kindle version seems to be available already. It's also available on Safari books on-line. Even turned up at Tesco!

Friday, November 12, 2010

I was using a tool the other day, and I started it in the background. It didn't come up and, when I looked it had stopped. When this has happened in the past I foreground it and it continues working. I've only noticed this on rare occasions, and I'd previously put it down to some misconfiguration of the system. However, one of my colleagues had also noticed it, so this was the ideal opportunity to figure out what really was going on.

The first step was to identify which process was stopped using jobs -l:

So which sources the .cshrc file, and my .cshrc file happened to contain stty erase ^H. So why does this cause the process to stop?

Well stty controls the characteristics of the terminal, but when the script is executing in the background, there is no terminal. When there's no terminal, stty stops and waits for one to appear!

The easiest is to move the call to stty into my .login file. The .login file is only parsed at login, and not every time a shell is started. Alternatively, it's possible to check for the existence of a prompt:

if ($?prompt) then
if ("$prompt" =~ ?*) then
/usr/bin/stty erase ^H
endif
endif

So there was a small error in the code. If the total work was not a multiple of the number of threads, then some of the work didn't get done. For example, if you had 7 iterations (0..6) to do, and two threads, then the chunksize would be 7/2 = 3. The first thread would do 0, 1, 2. The second thread would do 3, 4, 5. And neither thread would do iteration 6 - which is probably not the desired behaviour.

However, the fix is pretty easy. The final thread does what ever is left over:

Redoing our previous example, the second thread would get to do 3, 4, 5, and 6. This works pretty well for small numbers of threads, and large iteration counts. The final thread at most does nthreads - 1 additional iterations. So long as there's a bundle of iterations to go around, the additional work is close to noise.

But.... if you look at something like a SPARC T3 system, you have 128 threads. Suppose I have 11,000 iterations to complete, I divide these between all the threads. Each thread gets 11,000 / 128 = 85 iterations. Except for the final thread which gets 85 + 120 iterations. So the final thread gets more than twice as much work as all the other threads do.

So we need a better approach for distributing work across threads. We want each thread to so a portion of the remaining work rather than having the final thread do all of it. There's various ways of doing this, one approach is as follows:

If, like me, you feel that all this hacking around with the distribution of work is a bit of a pain, then you really should look at using OpenMP. The OpenMP library takes care of the work distribution. It even allows dynamic distribution to deal with the situation where the time it takes to complete each iteration is non-uniform. The equivalent OpenMP code would look like:

This is chapter 3 "Identifying opportunities for parallelism". These range from the various OS-level approaches, through virtualisation, and into multithread/multiprocess. It's this flexibility that makes multicore processors so appealing. You have the choice of whether you take advantage of them through some consolidation of existing applications, or whether you take advantage of them, as a developer, through scaling a single application.