BOINC testing
We got 7.6.22 earlier in the week. I have it installed on the Parallella's and Windows machines. I am waiting on it to get into Debian Stretch (takes about a week) before I can install it on the Pi's. The main changes are around OpenCL detection and closing open files.

Raspberry Pi updates
Debian Stretch introduced a new libc6 (C runtime library) last week. It broke the Einstein BRP4 1.06 app that is sent to neon-incapable hosts like my B+. I ended up going back to the official Jessie release to get it going again.

I have an on-going issue with Debian Stretch - the std*.log files that BOINC creates are no longer getting updated. This seems to have happened some time back around BOINC 7.6.12 (October 2015) but I don't think its BOINC, rather some other changes that Debian are doing. I say that because the Windows machines and the Parallella don't have this issue. I did email the Debian BOINC maintainers however they can't see anything in the BOINC code that would cause it.

End of year
The end of 2015 is almost upon us. In reflection the farm has been updated to the most recent hardware available. Hopefully next year we will see some useful (from a crunchers perspective) changes to BOINC.

I mentioned in my last post an MPI capable BOINC would be useful for those who run farms and clusters. Another one I had hoped for was the Superhost idea. I may need to help fund development on one or both of them to get things happening.

12 December 2015

The Intel-GPU machines are crunching Seti work overnight, only because its too hot during the day.

The Nvida-GPU machines are off.

BOINC and MPI
As you'd know from reading my blog I have a "farm" of machines that currently are running BOINC to do the task scheduling. I had an idea that rather than running BOINC on each machine, why not run it on one machine and have it use MPI to communicate with the others and run tasks that way. If other people are interested in this idea then lets talk.

To that end I was following the instructions from the University of Southampton in setting up an MPI cluster using my redundant Raspberry Pi B models. The details about their Lego Supercomputer can be found Here. In the end it was still compiling the MPI software after some hours, despite the fact that I overclocked the Pi in question to 1Ghz. I gave up and went to bed. I will have to kick it off when I have plenty of time to waste seeing as it takes so long on the old B model.

The idea is that one would replace the BOINC API calls with MPI calls. Each app runs stand-alone, which most science apps seem to be written to do anyway, and then just passes the result files back for the BOINC client to handle. A better solution would be to have both sets of code and then the app can work via MPI or the BOINC API.

The BOINC client would need to be updated to check the status of worker machines and give tasks to the worker machines as needed. Exactly what a real compute cluster does, only BOINC is doing the scheduling and still handling the file transfers, etc.

28 November 2015

Farm status
We had a hot day again this week so the entire farm was off.

Parallella's and Raspberry Pi's currently doing Einstein work.

Intel GPU machines running overnight doing Seti work.

Nvidia GPU machines are off.

Intel-GPU machine upgrades
So far 4 out of 6 machines have been upgraded. Two more are in the shop awaiting CPU's to arrive. I should be picking them up next weekend.

It seems the Intel HD Graphics 530 drivers give invalid results. I've tried the original one supplied, the current release one 15.40.12.4300 and the prior version without any success. There is a beta 15.40.12.4326 which I have yet to try, however the release notes don't make any mention of OpenCL so I doubt it will make any difference.

I mentioned in my last post that I did try Linux Mint on one machine however it was corrupting the video display. Since then I've downloaded the latest BIOS version and updated all four machines. When the other machines come in I may try Debian on them. Linux usually doesn't get updated graphics drivers so it may be that I can't use the GPU if I run Linux on them which is about the only reason why I run Windows.

Further upgrades
The Nvidia GPU part of the farm has a couple of 5th generation 6 core/12 thread machines with dual GTX750Ti's and there are also a couple of i7-3770's with GTX970's that I use for GPUgrid work.

My current thinking is to replace the i7-3770's with the same motherboard that I've been using recently (Asus H170-Pro) with an i5-6600 CPU. This way the CPU is basically there to run the GPU and not much else.

22 November 2015

We had a couple of hot days during the week where it got up to 41 degrees C. Everything was off that day.

Hardware upgrades
Three (out of six) i7-3770 machines that make up the Intel GPU part of the farm have been updated to i7-6700's. The other three are planned for next weekend.

Intel GPU drivers for the HD Graphics 530 produce invalid results. I've tried 3 different drivers so far without any success. I have suspended crunching on the built-in GPU for the moment.

Due to Win10 rebooting machines at 3:30am (after it installs updates) I have gone back to Win7 on the new machines.

I tried Linux Mint 7.2 on one machine but it corrupts the video display and that was just trying to get it installed. I could put Debian or Ubuntu on them but then I would lose the ability to use the GPU - not a great loss at the moment because they don't work anyway. However I expect Intel will correct their drivers eventually.

BOINC testing
We got 7.6.15 this week. The main tweaks are around the task CPU and I/O priorities under windows. There have been reports of increased task failures, BOINC manager not updating and the client taking a long time to start up. A number of us have suggested backing the changes out. I've gone back to an earlier build (7.6.9).

Pi woes
I have also been battling with the Raspberry Pi's. The B+ refused to upgrade to Debian Stretch. I managed to brick it a couple of times and had to go back to the official Raspbian Jessie image.

Meanwhile the Pi2's have got onto Debian Stretch but are refusing to put the latest BOINC client on, complaining about unmet dependancies, even though they've got everything on the list installed.

07 November 2015

The B+ Raspberry Pi is playing up. Its probably worn out the micro-SD card, so I will get a new one and copy across the image from the current card.

The Nvidia GPU machines are off.

Upgrades
I upgraded one of the Intel GPU machines this morning. I'm running some work on it now to get an idea how much faster (or slower) it is from the i7-3770 that it replaced. The BOINC benchmarks indicate lower floating point but greater integer performance. The GPU is noticeably faster than the i7-3770.

I've replaced the power supply with a lower wattage (450w) gold-rated unit. I also replaced the motherboard (ASUS H170), CPU (i7-6700), memory (8Gb DDR4 at 2133Mhz) and CPU cooler. Its in the original case using the original DVD-ROM and hard disk.

Windows decided it was no longer activated. I called Microsoft to get an activation code and after going round in the phone queue 3 times providing my Win7 product code I get told that I have to reinstall Win7 activate it and then upgrade to Win10. That was my afternoon written off.

I will order the parts for the remaining Intel-GPU machines (5 of them) next week and probably be in the computer shop doing them a pair at a time for the next few weeks.

BOINC news
The BOINC 7.6.12 from the Debian Stretch repository seems to have an issue. It no longer writes out to the log files. The stderrdae and stdoutdae are no longer created and the std*gpu files are empty. We think this may be caused by other Debian updates. The Parallella version which runs under Ubuntu doesn't have this problem.

We got 7.6.14 today, so I have that installed on 1 of the Intel-GPU machines at the moment. It has some tweaks to reduce the I/O priority of itself and the tasks (not to be confused with the CPU priority) for windows only. There is some further refinement around number and time displays in the manager and tweaks for showing multi-GPU tasks.

CPDN
ClimatePrediction went off the air a fortnight ago. They had a problem with their storage sub-system controller that required a firmware update and then they had to restore/rebuild their 200Tb storage array. They came up partially last week and then took most services off the air. They seem to be back online now although its very slow, probably due to large numbers of climate models being returned.

01 November 2015

Farm status
Currently everything except the Pi's and Parallella's are off. I have been crunching overnight, one week doing Asteroids and another week on Seti. Its usually too hot in the afternoon and evening to crunch.

New builds
No progress on the new builds yet. I did ask the shop if they could come up with a decent power supply around 360w but the nearest they have is a 450w. The i7-6700 that I plan on using only has a 65w TDP so no point in having too big a power supply when I'm trying to reduce the power consumption. I will have to chase the shop up and see if I can get things moving.

For sale
I still have the old i7-970 machines for sale if anyone wants them. I have two. They've been decommissioned from the crunching farm and are sitting around gathering dust. They are in CM Storm Sniper cases which are pretty big so you'd have to be willing to collect from Sydney's eastern suburbs.

The SuperMicro X8STi-F motherboard is also available. It came out of the file server when it got upgraded. It has an i7-920 CPU, Intel heat sink, 6Gb of memory and driver disc. Currently in an anti-static bag.

And lastly I have a collection of various graphics cards from GTS250 (happy to give them away), GTS450SP, two GTX570 and one GTX670.

18 October 2015

Farm statusIntel-GPU machines
One or two running overnight doing Seti work (depends on how hot it gets)

Nvidia GPU machines
Off

Parallella's
Running Einstein BRP4 work

Raspberry Pi's
Running Einstein BRP4 work

Parallella and Pi updates
LocutusOfBorg has updated his ppa with BOINC 7.6.12. I have updated both of the Parallella's with it. It has a fix to report the ARM CPU features properly. It also has a fix for reusing slot directories that only effects Linux and OSX clients.

The Raspberry Pi foundation seem to have created their Debian Stretch repositories. Well partially, archive.raspbian.org doesn't seem to have them yet. I have updated the Pi2's (the B+ is upgrading now). Stretch is the testing version of Debian so expect many changes before its release. It does however provide BOINC 7.6.12.

Windows updates
Microsoft have been pushing Windows 10 out to Win7 and Win8 machines even if they didn't select the update. Supposedly this was an error. It seems they are getting quite aggressive with getting people onto Win 10. In the mean time I will look at getting the Intel GPU machines onto Linux as a longer-term goal.

Raspbian updated
The Raspberry Pi Foundation finally got to releasing Debian Jessie. Raspbian is the customised version of Debian that is officially supported on the Pi. You can also run other operating systems such as RISC OS and Ubuntu although they don't have official support.

They have yet to set up repositories for Debian Stretch (testing) so I still can't get an up to date BOINC client short of compiling my own. One of the bugs with the 7.4.23 version in Debian Jessie seems to be the size of the log files grow unbounded. Despite setting a 2Mb file size limit I regularly get log files in the order of 38-53Mb and have to manually clean them up.

Climate models
A batch of 12,000 Pacific North West (PNW) regional models were released by the project, so I have been crunching these for the last couple of weeks. Plenty of supply. One of the upload servers was off for a few days and so I had hundreds of zip files stuck trying to upload. The zip files are mostly 8Mb and the final (13th file) is 15Mb for each work unit. They fixed the server a few days ago but its took me two days to clear the backlog. When 6 machines all try and upload at the same time my 1Mbit upload speed just can't cope and nothing much gets through. I had to clear one machine at a time until they we're all done.

Now that uploads are sorted I am running off the remaining work units which takes around 32-36 hours. I've been running the machines overnight when its cooler so they get some processing time. The climate models don't like being stopped and quite often crash if you shut down the BOINC.

OpenVMS
A year or so ago I got myself a second-hand Alpha Server DS15. Unfortunately I had a few issues with it wanting a terminal plugged in instead of using the system console. Anyway I'd like to get OpenVMS installed again and running. I've got discs and a hobbyist license so if there are any OpenVMS gurus out there who live in Sydney I'd love to hear from you.

27 September 2015

Farm status
Intel GPU's are running a mix of Climate Prediction regional work units (CPU) and Einstein on their GPU's.

The Nvidia GPU's are also running some Climate Prediction regional work (CPU) and nothing on their GPU's. There are no "short" GPUgrid work units available for the GPUgrid crunchers.

The Pi's and Parallella's are running Einstein BRP work

Win10 upgrades
I spent a bit of time running around to each machine upgrading them to Windows 10. I had one of the Intel GPU machines fail complaining about being unable to boot, but a subsequent attempt worked fine.

I have also updated the GPUgrid machines however they don't have any short work units available so haven't been able to check them. They had an old version of EVGA Precision (2.03) installed on these machines and it didn't work so I need to get a more up to date version. I usually use Precision to control the fan speeds as the cards are already factory overclocked.

I still have the 6 core/12 thread machines to do but wanted to see how the Nvidia drivers work (or don't) under Win10 before doing them.

So far I haven't had many issues, just finding out where things are in Win10 has been the biggest problem. I use the OO-Shut Up program to set a lot of the privacy setting and then work my way through the settings panel for the others. I remove some of the junk they give you like the Xbox app which a dedicated number cruncher really needs. To do that you use powershell commands. I turn off a few services and fiddle with the time synchronisation. I put the old task manager on them and finally let Ccleaner loose to clean things up.

19 September 2015

The Intel GPU machines were up until a few hours ago running Seti work, however there is no network connection to Seti or BOINC at the moment. I have switched them over to Asteroids for the time being.

The Nvidia GPU machines have been resting.

Windows 10
I have updated a couple of the Intel GPU machines from Windows 7 to Windows 10. Apart from having to set all the privacy settings and set things up again they seem to be working fine. If there are no problems for a week or so I will update the rest of the Intel GPU machines.

The only reason why I didn't switch to Linux is because setting up a GPU to work with Linux is such a pain and usually doesn't work. If the Linux distro people could get them to work out of the box then I would have switched to Linux.

A number of the Win 7 machines are silently downloading Win 10 despite my removing various windows updates. The file server has also copped the telemetry updates which Microsoft seems to consider mandatory. Apart from MS snooping they seem to want to force people onto Windows 10. A cynical person would suggest that the NSA had a hand in this current direction that Microsoft is going down. Well I haven't got anything to hide as most of my machines are dedicated number crunchers with nothing else on them.

05 September 2015

Farm news
The Intel GPU machines spend most of the week doing Seti and then turned to Asteroids. A few ANZ climate models were release by ClimatePrediction so all the Intel GPU machines are crunching them now.

The Raspberry Pi's and Parallella's are doing Einstein BRP4 work.

A couple of the Nvidia GPU's have been started up this weekend and are doing Seti work.

Parallella and Ubuntu
I found an updated Parallella image that was released 30th of Jan 2015. It is based on Trusty (14.04 which is a long term support release) however it has a more up to date kernel. I have downloaded it and then gone through the process of reapplying my usual changes. This image seems to behave better than the image I had before which I believe is because of kernel and firmware related fixes.

I'm running off the remaining Einstein BRP4 work units and then I will switch over to this newer release. This also gives me the opportunity to get a later BOINC client onto them via LoctusOfBorg's ppa.

BOINC release
The 7.6.9 that I was testing became the release version last week, so if you're running an older version I'd suggest you upgrade to this one.

Future hardware
I mentioned last week the i7-6700 series of CPU's. After looking on the Intel web site today there is some more information. It seems there is an i7-6700K which is 4 core 8 threads running at 4Ghz with a TDP of 91 watts. The only one listed two weeks ago. There are quite a few listed now. There is an i7-6700 4 core 8 threads running at 3.4Ghz with a TDP of 65 watts. I am interested in this lower wattage version at the moment.

30 August 2015

Farm news
New proxy server machine delivered and its up and running. Older one has been turned off but is still around - just in case.

I've been going through Windows 7 installed updates and deleting the Windows 10 ones off (the ones that are supposed to ease the migration to 10). I read that Microsoft are planning on putting the same telemetry onto Win 7 and 8 so its probably time to look more seriously at running Linux. I don't have anything of interest on my number crunchers but I do like to set them up MY way not the way Microsoft want and that is my main concern, well that and the fact they keep putting broken drivers out.

I finished off the batch of climate models. I need to check the climate prediction website regularly as you never know when new models will be released. Seti were running short of work so I did Asteroids work for a few days before switching back to running Seti tasks.

BOINC updates
We got 7.6.9 this week, which removes a feature they added in 7.6.8 to do with shutting down the core client when its started as a service. There is better reporting of ARM processor features if you're running Linux on an ARM. There is some formatting changes to numbers (thousands separators) and times (show days if greater than 24).

Raspberry Pi updates
They updated their Linux kernel from 3.16 to 4.1 even though they still have not switched over to Debian Jessie yet. You can run Jessie by changing the apt sources.list however the BOINC client in it is 7.4.23 and the current release version is 7.6.6.

22 August 2015

Farm news
The new proxy server bits have arrived at the shop and are being assembled. Not too sure if it will be finished this week or not. The current one is working fine so there is no urgency on this.

Seti had a couple of outages due to network changes on their end as well as their regular weekly outage, so I switch the machines to Asteroids when that happened. I currently get about 5 days a week on Seti and 2 days for Asteroids before switching back again.

Last week had a couple of cool days so I managed to get the GPUgrid cruncher going for quick burst. Their "long" work units take 12 hours each so I only managed to get about 4 work units done.

ClimatePrediction had some new ANZ work units which I missed the week before. This week there were some more and so I have all the Intel GPU machines crunching them. Due to the size of their upload files I am back to managing the uploads which keep getting backed up. I have some "spare" work units as I only run 4 at a time on the i7's. They will take until the middle of next week before they complete.

Hardware upgrades
Intel have announced (if not released) their 6th generation i5 and i7 CPU's. The i5-6600 and i7-6700 series. I am looking into getting the ASUS Z170 motherboard and i7-6700 as replacements for the 3rd generation (i7-3770) machines that I have. I am not too sure if the HD graphics 530 is useful for crunching or not yet.

09 August 2015

Farm news
Intel-GPU's mostly doing Seti but they had their regular weekly outage when the farm ran out of work. Then they had a special database backup running just before the weekend so I switched to running Asteroids for a day. Asteroids have also run out of work.

The two GPUgrid crunchers were running work for Asteroids and GPUgrid, but today was warm so the GPU's were set to finish off their work. Due to Asteroids running out of work I started a batch of Einstein work.

The Raspberry Pi's and Parallella's are doing Einstein BRP4 work.

With Windows 10 I am taking a wait and see approach after the initial problems. I might try installing it again in maybe a month or so. As an experiment I might try putting another hard disk into the backup file server and try installing Win 10 on it. Its an old Pentium 4 and only has 1Gb of memory so probably won't work.

04 August 2015

Farm news
Intel-GPU's are still crunching away for Seti at the moment. We had a few warm days so no Nvidia GPU's were used until today. Currently one is running Seti. The Parallella's and Raspberry Pi's are doing Einstein work.

A new version of VirtualBox (v5) was released. I don't run any projects that use Vbox so I haven't installed it on any of the number crunchers.

Windows 10
Came out with a bang. Lots of interest and much nashing of teeth at the privacy options. You can turn them off as part of the install process, which it should really do by default. I highly recommend people disable most things unless you want it sharing everything about you, your contacts and WiFi passwords. I tried installing it on one of the Intel GPU crunchers. It got hung up installing the HD 4000 graphics driver from Intel, despite the fact I already had them installed. It didn't recognise the monitor so set itself into 1024x764 resolution (my monitors native resolution is 1680x1050). Despite that I spent a bit of time playing around with it. Edge, the new browser was quick as displaying web sites. It also reverted my windows time settings.

In the end I went back to Win7. It might behave better as a clean install but I think MS need to do some more work with the video drivers. It really should let me decide what one I want rather than trying the latest release. Nvidia users are also reporting similar issues where it decides what driver to install regardless of what the user wants. The one it installs doesn't work for number crunching. This was the Pro version too.

26 July 2015

Last week we got a bunch of Microsoft updates, nothing unusual about that. Its known as Patch Tuesday were MS release their monthly fixes. What was unusual was two days later there was an urgent security update. There is also a microcode update from Intel (its considered optional) which is the first time I have seen one of them.

The Seti optimisers released an updated installer and apps. I spent a bit of time this weekend updating all the machines that run Seti with these new apps and their various configuration files.

Project news - Seti
They got a bit of money for the next 10 years as part of Yuri Milners "Breakthrough Listen" initiative. With the publicity there has also been an influx of new users. This money is expected to be spent on adding receivers on a number of other radio telescopes so they can expand the search from the small portion of the sky that the Arecibo radio telescope can see.

We've already done a fund-raiser so they could build a receiver for the Green Bank, USA radio telescope. I have seen the Parkes, Australia radio telescope mentioned.

20 July 2015

Farm news
We are mostly crunching Seti work on the Intel-GPU machines, Asteroids and GPUgrid on a couple of the Nvidia-GPU machines. I finally finished off the last of the climate models yesterday and managed to get them uploaded.

BOINC news
We got 7.6.6 to test this week. The main change with this one is to check for files being locked and wait a second if they are before retrying. This should correct the issue of truncated stderr files being uploaded. The preferences were updated back with 7.6.2, so it includes those changes as well. This one is a release candidate. This will probably be the last version for a while seeing as there are no paid developers any more.

Windows 10
Its about 9 days away from release now. I haven't seen it, just some demos on the Microsoft site, however the people using it seem to like most of it. I am told it will be available on a USB memory stick (aka thumb drive). I will need to get one of them if I am updating all the machines but I don't have any immediate plans to update.

Apparently the Home version will have updates applied automatically, but the Pro versions have the option of choosing when to apply updates. This would cause problems for any machine crunching climate models with the Home version because the models don't like being interrupted.

12 July 2015

Farm news
Most of the farm picked up some more climate models last week so we're running them out now. There are no more ANZ models for the time being. Given they take a week and a few machines have spare work units it will be at least another week before they are all done.

I fired up one of the GPUgrid crunchers and its been steadily crunching through their work units, some short ones (only 4 hours) and some long (up to 12 hours). Once the backlog of climate models are done I will get the second GPUgrid machine running. I am waiting on one particular machine to complete its final model in about 20 hours.

I replaced the power supply in the backup file server so now its quiet again. I managed to use the power supply I got last week. I had to break the last 4 pins off the motherboard power connector so it fits the old Pentium 4 motherboard. I used the machine last week to do some experimentation with Debian and the latest version of the Squid (proxy server). I used a spare hard drive that I swapped in. This week its back to its former self.

It turns out Debian install a firewall by default but it has no rules and so doesn't actually do anything. I tried the recommended rules and had to add a couple of my own. While doing regular housekeeping on the Pi's and Parallella's today I added the same settings so hopefully they are a bit more secure now.

Internet gateway
Now the file server and the backup file server are done another little "project" I have going is to set up a machine as an internet gateway. Not like your domestic grade router/firewall but having more advanced features like caching and virus scanning. I could just buy one but they get expensive. I was looking at an F5 but they are more of a load balancer.

I have given the computer shop some specs (motherboard, case and power supply) and as seems to happen quite regularly now they and their suppliers don't have the parts. There is no rush to get this implemented so I can wait. While the hardware is ordered I have yet to decide on the software that it will run.

05 July 2015

Farm news
The second batch of climate models that I had running finished and were getting upload issues. They needed a bit of manual intervention to get them through. Their files are rather large and when you have multiple machines trying to upload at the same time its very easy to max out what little upload bandwidth there is. I usually manually allow them to upload 2 at a time to work around this. Another project that also has huge files to upload is GPUgrid, but at least theirs are once the work unit has completed.

Th cable for the RAID card finally turned up and so the file server now has its hard disks in a RAID 6 configuration.

While I was in the computer shop sorting out the RAID I also got a 1Gb PCI network card for the old file server and a replacement power supply. The fan in the old power supply is making all sorts of noises so is likely to fail soon. The replacement power supply has too many pins on the motherboard power connector for the ancient Pentium 4. I will have to use it in another machine. The network card has been installed and is working fine.

BOINC funding expires
Dr Anderson announced to the mailing list that the funding from the NSF ran out so there are now no dedicated BOINC developers. It will become a volunteer effort to maintain and update it. I am not too sure what that means for its future. Hopefully some of the suggested changes that have previously been ignored will get incorporated into it. I would have liked to see development of the Superhost idea too.

28 June 2015

Farm news
We're back to running ANZ climate models. CPDN threw out a few thousand and I managed to get 98 of them. I had to do some "manual intervention" to get BOINC to pick up these as all the machines had a full load of Asteroids work in flight.

The farm is currently struggling to upload all the result files now the first batch of 32 CPDN work has completed. There are lots of big files trying to upload at once. Despite having a theoretical upload speed of 1Mbit (Speedtest says 700Kbits) I can only get around 100Kbps to the CPDN servers. I am managing it by suspending file transfers and only allowing 2 machines at a time to upload and only a single transfer each. Of course the work units all finish around the same time which is why I need to manage it.

File Server
The cable for the RAID controller that is destined for the file server still hasn't arrived so that task is yet to be finished. At least the shop has the replacement case fan.

The replacement UPS arrived on Monday and was put into service straight away. Its the same as the previous model so the software to monitor it and the cable to the computer are the same. No changes required there at least.

Linux
I am looking at a couple of Linux distributions too. I have tried Debian 18 but got stuck. Where's the sudo command gone? You can install sudo from the repo but then you aren't in the group to use it so I gave it the flick. I have a old copy of Mint 15 on CD so also gave that a try but it looks like the repos are no longer available.

I am trying this on an old Pentium 4 and keep unplugging the hard disks from it so I can clean install it and then go back to what it had later. I have a box of old hard disk drives so swapping a PATA one out isn't a problem.

21 June 2015

Farm news
Nothing exciting happening at the moment. Just crunching away. I did a burst of Seti work and then switched over to running Asteroids work. There is no sign of new climate models.

While setting up an app_config file to control the Einstein GPU app I discovered there was an easier way to control how many tasks a project runs at the same time. For Climate Prediction I had specified each of their model types with a max_concurrent tag which limits them to that many running at the same time (per model). each type has a separate app that is specific to that climate model. I found that there is now a tag project_max_concurrent that controls it for all types. Rather than specifying each app can only run 4 at a time I can just put a single statement in and tell it to run 4 in total, that way if I get a mix of models/apps it will still do what I want.

They say Windows 10 will be a 3Gb download, which is why I would prefer a DVD that I can use on all the machines. I have read some on-line commentaries saying they don't think it will be stable by the end of July.

I am also looking at the server operating system to bring the file server up to date now that the hardware has been refreshed. On that note I am still waiting for the RAID controller cable to arrive so its not quite completed yet. The shop thinks it will arrive next week after it got back-ordered.

Raspberry Pi's
Q: What is better that a Raspberry Pi2?
A: Two of them!
I have three Pi2's. Now that they are stable, after re-imaging them a couple of weeks ago, I decided to overclock one of them. It lasted a day before it crashed. The default Pi2 overclock of 1000Mhz is too much for them even with a heatsink on the CPU. I removed the overclock (edit the file /boot/config.txt) and its back to factory defaults again. I might try again with lower settings than 1000Mhz. It could also be the memory speed and/or overvolt that makes it unstable.

We're still waiting on Raspbian switching to the Jessie release. At the moment their last official release of Raspbian was on the 5th of May based upon Wheezy. Debian are on Jessie as their release version.

15 June 2015

Farm news
The UPS on the file server died. It hasn't been holding its charge as you can see it regularly dropping to 94% battery and then quickly recharging. Then it started beeping and complaining "Battery failure". So I ordered another one (both Vanguard VGD1000) on Friday. I plugged the old one in on the weekend so I could get the error message and when I turned it on bang! and the smell of ozone. Well that is the end of it, probably could have changed the battery before, but not now. Oh well the new one is in and running, hopefully I will get more than 2 and a half years out of it.

I am still waiting for the RAID card and cable so I can get the drives in the file server set up. At the moment I am using a single drive.

On the crunching front the ANZ climate models ran out. I finished off the last one this morning. As the various machines finished theirs off I switched them over to Asteroids work, so now all the Intel GPU machines are running it. I gave the 6 core/12 thread machines a run at Seti work when they finished their climate models but they are currently off.

Windows 10
I updated the 6 core/12 thread machines with all the latest windows updates so they now have the icon to get Windows 10. I'd rather not download 10 copies of Win10 when it comes out so I will have to see if I can purchase a DVD. I would also prefer to clean-install them but will wait and see how the release goes, no doubt there will be some teething problems when it gets released at the end of July.

On that vein I am also trying to keep the graphics drivers up to date. Nvidia comes out with new drivers about once a month but they are usually just profiles for the latest games. Intel also comes out with new graphics drivers every couple of months, however theirs usually don't work well for number crunching so we tend to run older drivers which might not work with Win10.

Network infrastructure
I am also looking at the network setup with a view to making it a bit more secure and centralised. It will probably entail a new machine to replace the old file server which is doing other duties at the moment. There will probably also be some software changes and maybe a managed switch to allow network card teaming. Now all I need is my own networking consultant.

06 June 2015

A slightly different approach to the blog this week. Rather than write about what the farm is doing I thought I would show it in the screen shots below. They are taken from BOINCtasks showing the various groupings of machines.

This lot are what I call the Intel-GPU machines. They are all i7-3770's with integrated HD4000 graphics. As you can see we are only using half the available threads. That is because the climate models seem to take around 50% longer if run on all threads.

This is the Nvidia-GPU machines. Actually its only two of them, the 6 core/12 thread machines. As you can see they too are running climate models on half the threads. With a bit of fiddling I could have them also running GPU work on their dual GTX750Ti's but given the climate models come in burst of work and then none haven't bothered.

This is the Raspberry Pi part of the farm. There are three Pi2's and a B+ in there. As you can see they are all running Einstein work.

There are also a couple of Parallella's but I haven't bothered taking a screen shot of them. They are running Einstein work as well.

I got a one question survey from Adapteva (the people who make the Parallella) asking what we wanted to see on it. My answer was "FFTW running on the Epiphany chip". The reason being that a lot of the projects use FFTW and if it can run on the Epiphany and is able to do them quicker than the ARM cores then we can get a performance boost for a number of projects.

Farm news
In other news the shop got the RAID card that I wanted to put into the file server but it didn't come with any cables so they now need to source a cable before I can make use of it. I haven't paid for it yet. If they can't get the cable then its useless and they can return it to their supplier.

A couple of the Pi2's have been having issues recently. I ended up re-imaging them with the same image the the one reliable Pi2 had. They've been running for the last 3 days so hopefully that has cured the reliability issues. It could have been the overclock or the frequent kernel updates that made them unreliable.

Windows 10
There is a fair bit of chatter in the message boards about Windows 10 and Microsoft having rolled out a number of updates to Windows 7 and Windows 8 users that gives an icon to do the upgrade when its released on July 29th. Most of the crunching machines are running Win7 so its a free upgrade for me. The Parallella's and Pi's are running Linux. There is Windows 10 available for the Pi2 but I can't see myself switching them to Win10.

I haven't installed the Win10 preview so I am not too sure how its going to look. I have seen some of the video clips of the preview which shows a lot of features that I don't want or need on a number crunching machine. The preview updates daily and may not look like the final product yet. I will look at putting it into a virtual machine and see what it looks like on there before deciding if I will be upgrading the Win7 machines.

31 May 2015

Farm news
The farm had a mixed week with running out of Climate Prediction work and then doing Asteroids and Seti work. This weekend saw the return of the Climate work units so I ran down the cache on the Intel-GPU machines so they can run without any contention.

I have even put one of the 6 core/12 thread machines on the job. Its running 6 at a time. They haven't run CPDN work before so it will be interesting to see how they go. I am sure they will work, just not how long they will take.

BOINC testing
We got 7.6.2 late last week and I have it installed on a couple of machines. I have made a few suggestions about how the new Options windows should appear. I don't expect anything will happen but hopefully they will improve before it becomes a release version.

File Server
The file server is back and running. At the moment we've only hooked up a single hard disk. I have ordered an Intel RAID controller so there is no point running the others until its installed..We're not too sure if the controller comes with cables or not and its at least a week away. If no cables then they have to be ordered and another week's delay.

The shop put a PWM fan in the top of the case. It continually revs-up and then slows down. I've told them they will have to change it to a normal one. Its noisy and very annoying.

Proxy server
I run one of these to try and reduce the downloads. Since switching back to using the old file server I noticed it doesn't seem to be saving much of anything so have spent some time this weekend going through the logs and adjusting settings.

The trigger for this was the CPDN downloads bring down an 80Mb file per machine as well as a number of 20Mb and 26Mb files, all of which get duplicated on the other machines. Windows Update does the same.

I am also looking at replacing it with something more energy efficient as well as having more CPU power. It will become a dedicated proxy server.

24 May 2015

Farm news
Crunching continues with the Intel-GPU part of the cluster finishing off their Climate Prediction models. They ran out of the ANZ ones. I can do some of the short ones but they download large amounts of data and do a couple of 64Mb uploads. I did some of them but most seem to be resends of failed work from late last year and they fail anyway.

I had the 6 core/12 thread machines doing Asteroids work but Asteroids went off-line Saturday so they are now doing Seti work.

I did try some GPUgrid work but they too ran out of short work units. It seems they've refilled the queue so I will get them running again once the 6/12 machines finish off their work. Despite the cooler weather I can't run both at the same time because the room gets too hot.

File server
The shop got the socket 2011-v3 mounting only to find out the manufacturer uses a non-standard mounting so you have to purchase their cooler. Because they are proper servers they offer coolers for 1U, 2U and 4U rack mount. I have it in a tower case so size isn't an issue.

Maybe they will get it working next week so I can pick it up on the weekend. I will then need to set it up the way it was before.

I have also been researching RAID controllers. The motherboard has 2 built-in RAID controllers that can do RAID levels 0, 1, 5 and 10. These days RAID 6 (or 6+0) is recommended due to the probability of losing a 2nd drive while rebuilding the array. RAID 6 normally requires a chip to calculate the 2nd set of parity bits. I have narrowed it down to two cards of interest, one is an Adaptec and the other an Intel. Given the delays with the file server I have been holding off ordering one.

Future upgrades
One thing will be to update the CPU's in the 6/12 core machines. They currently have i7-5820k CPU's but I am thinking of updating them to the i7-5960x. Another option would be a Xeon as they also fit into a socket 2011-v3.

Another thing on the list is to update the proxy server to something more modern and more energy efficient. It could be an i3 or i5 based machine. At the moment the backup file server (a Pentium 4 @ 1.8Ghz) is doing the job but its slow.

16 May 2015

Farm news
The Intel-GPU machines have all been doing Climate Prediction work. Well they were up until this afternoon when the CPDN database went off-line so I can't report work or do trickles, I have suspended them and we're concentrating on Asteroids work now.

I noticed a lot of the CPDN work units taking 165+ hours to complete when they normally take 110 hours. I put this down to CPU cache contention running 8 at a time. I am now running only 4 at a time per machine.

I did manage to do some GPUgrid work during the week. I was also testing a fix to BOINC to do with reuse of slot directories. The official fix will be coming in 7.6, at the moment we are testing the user option settings.

The Raspberry Pi's still haven't officially switched to Debian Jessie yet despite it becoming the stable release a week ago. Hopefully they will get there in a couple of weeks.

File server upgrade
The file server is off in the shop for an upgrade. The shop have already encountered their first problem. As a result they now have it until next weekend. The CPU cooler doesn't have the right mounting bracket for an Intel socket 2011-v3 so they will need to get one in. The case was full of dust despite cleaning it regularly and dust filters. I am taking the opportunity to replace all the case fans. They are still working, its more preventative maintenance seeing as the server runs 24x7.

I still need to find a suitable RAID controller for the file server that can do RAID 6 or better. The motherboard supports RAID 5. Statistically a second drive will fail around the same time as the first one and RAID 5 can only handle losing one drive so they recommend RAID 6 or better these days.

02 May 2015

Farm news
This week had a bit of the usual crunching, some Linux upgrades and a hard disk failure. Oh and finalising the file server upgrade.

Crunching is continuing for the Climate Prediction ANZ work units which take about 5 days to complete (each). The Intel GPU part of the cluster is running them. I also have been running a few GPUgrid long work units that take about 11 hours on the GTX970 and Asteroids work on the CPU cores.

Debian Jessie was released to the public along with Ubuntu Vivid. I was already running Jessie on the Raspberry Pi's but they haven't officially updated to Jessie yet. I tried upgrading the Parallella's to Vivid by going to Utopic (14.10) and then upgrading it to Vivid (15.04) but that failed. I had to reimage the SD card back to 14.04 and then update to Utopic. I suspect the kernel is too old as it been stuck on 3.12.0 for quite a while.

The hard disk in one of the GPUgrid crunchers failed after it had been running overnight. Fortunately I have a few spares so its been swapped out with another of the same vintage. I had to reinstalled windows, BOINC and a few other apps. It had a WD Black manufactured in June 2012 and they have a 5 year warranty. The on-line retailer has gone out of business so the only option is to return it to Malaysia. Given the postage cost its not going to happen. I am surprised that WD don't have a Australian distributor or a collection point.

File server upgrade
The other thing this week had me chasing up the computer shop in regards to the file server. I can't get the CPU I was originally after. Intel only shows them being available in trays, which means buying 50 or 100. I settled on the next best CPU a 6 core/12 thread Xeon with an 83 watt power rating. Memory unfortunately has to be ECC and DDR4 so that is costing a bit.

I have also ordered some 4Tb WD Se drives to go into the file server. I can then reduce the number of drives and still have more disk space. They will be in a 3 drive RAID 5 configuration which should give around 8Tb of usable space.

24 April 2015

Farm news
For the last few days Sydney has been hammered by a storm with strong winds and rain.

Meanwhile crunching continues. Its quite appropriate that I am running climate models. All the Intel GPU machines are running CPDN models, mostly the ANZ (Australian and New Zealand) ones that take around 120 hours with a few short models that only take 50 hours.

I managed to also get a burst of GPUgrid work done. They keep running out of their short work units which take around 3 hours so I opted in to the long work units that take 9-12 hours. I ran a few before I had to button up the house due to the storm.

More upgrades
The continual upgrades keep happening. This time I am looking at the file server. It will get a new motherboard, CPU and I will steal the memory out of one of the 6 core/12 thread machines, in return it will get faster memory. The rest of the parts will get reused.

The end result will be a file server that can expand as the new motherboard has 10 SATA III ports (old has 6 SATA II and the're all in use). It has a few PCIe slots (old only has one), it supports disks larger than 2Tb and the CPU power drops from 120w to 85w.

I will update the hard disks at some later date to bigger capacity ones but less of them. It only takes 3 drives to make a RAID 5 array. There is no urgency to replace the existing drives which are 4 x 2Tb.

17 April 2015

Farm news
The weather has been cooler for most of this week so I have had the Intel GPU machines running climate models (still going they're up to 113 hours so far) and some Asteroids work. Asteroids have finally fixed their missing files issue so work is now flowing again.

CPDN announced that they will only target 1 particular platform (Windows, Linux or Mac) for each type of climate model in future to save on development and improve their reliability. I would think that it may be easier to issue the work units as VirtualBox VM images so they don't need to get involved in which operating system to target.

Intel driver update
Intel released driver 10.18.10.4176 for the HD4000 so I was trying it with Einstein. It actually seems to work. The last few releases from Intel haven't worked. I didn't do many work units but managed to get the BRP4 work units done and then some Parkes PMPS XT (aka BRP6) work units. The bad news is its quite a bit slower than the (recommended) 10.18.10.3621 driver. I didn't try it with Seti and have since gone back to the 3621 driver as its faster.

BOINC testing
We got an early look at the preference changes in 7.5.0. They seemed to work fine but I have suggested some cosmetic changes. Others have also asked for additional settings such as a "in use" and a "not in use" set of preferences. No word yet on them coming or not.

Windows updates
Got a few fixes again for patch Tuesday as its known. There was the usual run around and update the farm. Also a few for the Raspberry Pi's (Debian Jessie).

While that has been going on I have been trying to get the windows time software (w32time) to behave and keep the PC's clocks more accurate. Microsoft chose to do their own version of the ntp client that works somewhat differently from the standard ntp software. Anyway after fiddling with a few things and using google a lot I have them working as they should be.

03 April 2015

Farm news
The weather is cooler now so I am managing to get a bit of work done. Last week I managed to get a 3 of the Intel-GPU machines running CPDN work all weekend. They have some HadCM3S (short) work units that only take around 16 hours to process. The bad news is they produce 2 upload files each that are 64Mb, so then they kill the internet connection as they all try and upload at the same time. I restricted the number of uploads for them to try and improve things. This week I am running more of them, but only on two machines.

The 6 core 12 thread machines are running Seti work at the moment. I'm trying to get the Seti credits up to the Einstein credits (they are about 200,000 less). The only Einstein work I am running at the moment is on the Raspberry Pi2's and Parallella's.

Proxy server
I've been running Squid 2.7 for some years now without any updates, so the last month has seen me updating it to something more current and try and get it to behave. One advantage to this is I can finally use HTTP 1.1 (just as they release 2.0). Squid 2.7 only supported HTTP 1.0. There is some more fiddling and optimising I need to do that takes time. This was further complicated by getting a new router which has various firmware bugs and loses some settings when rebooted.

NTP
There was a security bug discovered with NTP (the Network Time Protocol) and the Linux guys were pretty quick with a fix. The problem was they now ignore user settings. That lead to some late nights and trying to work out why it seemed to ignore my config settings and do something totally different to what it used to do. The init script that starts it up uses its own dchp derived config file so I had to fix the ntp init script to point to my config file.

The fix version seems a bit behind the "official" release from the ntp.org website which is up to 4.2.8p1. I expect the Linux guys will update to that eventually.

Future purchases
I'm looking at the network infrastructure at the moment and what needs to be done to make it more secure and reliable. I suspect a new (dedicated) proxy server and possibly new file server may be on the shopping list. More on this when I have a better idea how to organise the network.

15 March 2015

Farm news
We're crunching overnight at the moment. Not all of the machines as its still not cool enough overnight but some of them are getting work.

The Pi's and the Parallella's are still running constantly too.

This week saw a bunch of windows updates, further complicated by the fact most of the windows machines have been off for the last month, so a heap of updates to download and apply.

I installed the GTX970's into the GPUgrid crunchers and have run a "short" work unit on each machine (they take around 2 hours). The GTX670's will go up on eBay soon.

BOINC testing
We got 7.4.42 for Windows to play with. No major changes just some bug fixes.

Project news - Asteroids
They've been having issues with work units failing to download (server side issue) and the guy that runs it is working in another city so its been left alone. It also was getting a certificate error, but he has managed to fix that. Its out of work at the moment and we're waiting on more.

Project news - CPDN
They've been restricting their work units to different computers, so the ANZ climate models which I used to run are now restricted to Mac computers only. The EU climate models I can get on the Windows machines and they take around 9 hours. The remaining work units are restricted to Linux hosts.

01 March 2015

Farm news
Everything is off except the Raspberry Pi's and Parallella's.. Today got up to 32 degrees, which is basically how hot the room with the computers gets. I really need to look for an alternative (air-conditioned) location for the computers.

Raspberry Pis and wisdom
Much like the last fortnight I have been concentrating on tuning the Pi2's to get the most out of them. That involved the fftw wisdom files which tell fftw what function choices are quicker. The Einstein app that runs on the Pi's and Parallella's will use it if its there.

I ordered some more copper heat sinks as I only had 2 sets, but 3 Pi2's. I also got some USB cables with power switches, as the only way to power them off is to unplug it. The power cables arrived two weeks ago and the extra heat sinks last week.

The timings I am getting for the Einstein BRP4 tasks are around 16.17 hours for Pi2 with wisdom and 13.83 hours with the wisdom and over clocking to 1Ghz. In contrast the B's and the B+ take around 31.5 hours. I have generated a wisdom file for the B+ as well (all the B's have been retired) and will have to wait for some results to see if it helps.

Generating a wisdom file can be tricky because the Einstein app has been compiled with fftw 3.3.2. Debian Wheezy has 3.3.2, Debian Jessie has 3.3.4 and Ubuntu Trusty has 3.3.3. The wisdom file has to match the version of fftw. For the B+ I had to put Wheezy on an SD card, boot up the Pi with Wheezy, generate the wisdom and then copy it over to Jessie.

Pi surgery
I also did a bit of surgery on one of the cases to see if it helps with cooling as you can see below.

The idea was the same as the older B model Pi that I mounted a fan on top, however thought I would try with just the grill and see how it goes. The Pi with this is running about 5 degrees (C) hotter than the ones without their top on. The case design is curved so its difficult to mount a fan.

15 February 2015

Farm news
Its still hot so nothing much running except the Parallella's and the Raspberry Pi's.

The GTX970's arrived but I haven't even opened the parcel they came in as I have been concentrating on the Pi's When it cools down a bit I might get them running.

Raspberry Pi's
It was a bit of a surprise that the Raspberry Pi Foundation released the Pi2. Last week I ordered first 1 and then another pair of the Pi2's. I spent quite a bit of time trying to get them to work until I swapped the power supply. It seems the dual USB chargers that work fine with the B and B+ are not what the Pi2 wants. Once I worked that out it was a straight forward process to get them going.

Unlike the B+ they now have a quad-core ARM SoC (System on a Chip). On top of that they are an ARM v7 and have 1Gb of memory. They cost the same as the B+ and have the same board layout. Even Microsoft is looking at running Windows 10 on them. Currently they run the same Debian as the B+ but with an updated bootloader and kernel. There is even an Ubuntu Snappy available for them.

I have 2 up and running at the moment. I have been running the B's down and as they complete their last task retire them. I will probably keep the B+ for a bit as I am using it for some Seti app testing. Still waiting on a case for the second pair as they were out of stock at the time. I will probably get different cases as I need to add a fan. The little case offered on element14 isn't suitable to add one (the case is thin plastic and curves). Most B+ cases have holes for the Pi camera module which doesn't help either.

I have completed some Einstein work so far with the Neon apps being a bit slower than the Parallella's by about an hour but need to try a few more to get better timings. Once done there is an fftw plan that I wanted to try and see if that makes any difference and then finally I will look at overclocking them once I have a better cooling solution.

30 January 2015

Farm news
Weather has been hot so not much running again. The last few days were wet and cooler so I managed to get some of the farm crunching. Overnight I managed to have one of the GPUgrid crunchers processing their "long" work units (they ran out of short ones). I also ran a few CPDN work units (they take 60+ hours).

I have ordered a pair of EVGA GTX970 cards that will go into the GPUgrid crunchers. While my existing GTX670's can still do the work, the work units keep getting bigger and they are taking longer to complete. The new ones should use less power and produce less heat. I will put the old cards up on eBay once they've been swapped out.

Speaking of bigger work units GPUgrid did a special batch of "very long" work units that were expected to take 24 hours on the fastest of cards ("long" work units take 8 to 12 hours). A lot failed at the end because the project forgot to increase the maximum allowed file size. They create 185Mb result files. They have since corrected the file size limit and a number of users did a work around.

Seti App testing
Claggy has managed to get a Seti Multi-beam app compiled for the Raspberry Pi and the Parallella.

In early testing we found that the current 7.28 version would run but failed to validate. He has gone back to an earlier 7.0 version. I have done a couple of normal work units (they take around 66 hours on the Pi) and then I got a vlar (very low angle range) work unit that took 157 hours on the Pi. These at least validate.

The Parallella is faster but I had trouble transferring the program across. It seems the SD card partitions aren't set up as mountable. After posting a message on the Seti forums it seems one has to do the following:

11 January 2015

The weather has been hot so everything has been off for the last two weeks, except for the Parallella's and Raspberry Pi's. At the moment they are out of work due to Einstein@home having a problem with their upload server.

I took the opportunity to take the Parallella's apart and clean them. I have a 60mm fan mounted on top of the standard case so they accumulate dust. The fan blades have built up dust as well as the Parallella board and inside of the case having fine powdery dust.

Today turned cooler so I have fired up the entire Intel-GPU part of the cluster (6 machines) and the two new 6 core/12 thread machines. The Intel-GPU machines are doing Asteroids work. The 6/12 machines were running a mix of Seti and Asteroids wok.