"For a moment, nothing happened. Then, after a second or so, nothing continued to happen." — HHGTTG

Month: July 2006

Back in May I wrote about the performance problems we were having with our new NFS based user filestore. It’s been a while since then, and the problems have continued. We have noticed that it appears to be load related – not just the network, but also the machine. This suggests that our theories about IPsec causing the slow down may be correct.

Our original plan was to try a private network which would remove the need for IPsec and also remove any latency added by routing the traffic between our subnets. This still seemed like a good plan, so I asked around and another department kindly lent us a brand new gigabit switch. We’ve connected this to one of our NFS clients and to the cluster node that’s currently running our filestore.

So far we’ve noticed some serious performance boosts. There’s only a few of us using it, so it could just be that it’s a lightly loaded connection – time will tell on that one. The bottom line is that it seems to be quicker than the IPsec connection ever was, so hopefully we’re on to a winner. We’ve also got a few staff testing it out, and their responses have been positive so far.

The next step after this testing period is to look at the costs of doing this properly with our own equipment. One of the key things we’ve been doing recently is increasing the redundancy of our systems, so it’d be fairly daft to do this with just one switch. We’d need at least two, with every cluster node connected to both, and every client that we want optimum performance on connected to both. Obviously there’ll be other clients that are less important and they can continue to use the existing infrastructure.

Of course, I’ve got absolutely no idea where we’ll put these switches, or how we’ll wire them in – things are pretty tight in our racks at the moment. Suppose there’s got to be a challenge somewhere 🙂

My only worry with all this is what we’ll do if it doesn’t work. I don’t have any other ideas that’d make it go quicker – to be frank, you can’t really get any quicker than a directly connected switch. Lets hope we don’t have to worry about it.

In my last post about setting up a slimserver I said that I was having trouble getting slimp3slave working:

Whilst it doesnâ€™t appear to have any problems, I didnâ€™t have much success with the players. mpg123 got confused by the stream, and madplay kept skipping the beginnings of tracks when I hit next on the server. This could be a problem with slimp3slave – Iâ€™ll need to investigate.

The problem did turn out to be with slimp3slave. I discovered that when skipping a track the stream is restarted which caused slimp3slave to start up a new player. The problem was this is that it did it before the old one had exited, thus causing the new one to die because it couldn’t access the sound device. There’s another bug here – it didn’t notice the new player dying and tried to write to it, which resulted in lots of SIGPIPE messages.

So I looked at the code for shutting down the player and noticed that it wasn’t using the right close function. This change fixed it: