Posted
by
Unknown Lamer
on Wednesday September 12, 2012 @10:50AM
from the build-a-beowulf-wait-a-minute dept.

hypnosec writes about a neat little hack using Lego, Raspberry Pis, and Scratch to construct a "supercomputer." From the article: "A team of computational engineers over at the University of Southampton led by Professor Simon Cox have built a supercomputer using Raspberry Pi and Lego. The supercomputer is comprised of 64 processors, 1TB of storage (16GB SD cards in each of the Raspberry Pis) and can be powered on using just a single 13-amp mains socket. MPI is used for communications between the nodes through the ethernet port. The team managed to build the core of the supercomputer for under £2500. Named 'Iridis-Pi' after University of Southampton's supercomputer Iridis, the supercomputer runs software that was built using Python and Scratch. Professor Cox used the free plug-in 'Python Tools for Visual Studio' to develop code for the Raspberry Pi."Lots of pictures of the thing, and a howto on making your own.

Gussy it up however you want, Trebek. What matters is does it work? Will the Rasperry PI supercomputer calculate large prime numbers? Because I've ordered devices like that before - wasted a pretty penny, I don't mind telling you. And if the Rasperry PI supercomputer works, I'll order a dozen!

The ARM v6 processor took more than 2 days to render something that takes 10 minutes on a Core i5. So, "supercomputer" this cluster is not.

* You may say, "Hey, this test is running using soft-float! If you used hard float, it'd be faster!" Well, you would be right that it would render faster under hard float, but this processor still wouldn't finish rendering in less than a day, let alone come anywhere close to Core i5 or Cortex A9.

The image rendered is 384 x 384 pixels. MSM 7227 results are 0.70 pps and 1.17 pps/GHz. Raspberry Pi is runnying at 700 MHz, so it should theoretically get 0.82 pps. Its possible (and fairly easy) to split up the rendering among all the CPUs in this cluster with some custom scripting, so this benchmark image could theoretically render at 52.42 pps. That Core i5 2400s I mentioned above renders at 235.18 pps!

There's a bit of apples and oranges comparison there. You are comparing single-core processors to a quad-core processor. Of course the i5 is going to be faster. It would be better to divide the performance of the i5 by 4, to represent the performance of a single core of the processor.

There's also a cost comparison. Just the i5 processor is ~$200, not to mention the motherboard, RAM, etc. Let's just say you can build a computer with an i5 for about $800 That's the same price as 32 Raspberry PIs. So if

You are comparing single-core processors to a quad-core processor. Of course the i5 is going to be faster. It would be better to divide the performance of the i5 by 4, to represent the performance of a single core of the processor.

Based on those numbers, the quad-core i5 processor is approximately equal in performance to 335 Raspberry Pi cores (at this type of computation). Thus, even a single-core of the i5 would still be equivalent to almost 84 Raspberry Pi cores, and costs only about $600 even if you bu

The GPs figures are off. He is using a horrible compiler setup, not only is he using the softfloat calling convention, he is using -mthumb which AIUI will prevent the code from making direct use of the hardware FPU (and I suspect he uwas using debians version of libc preventing indirect use of the hardware fpc through libc routines)at all on armv6. According to hexxeh the povray benchmark under raspbian gives the following results under raspbian on a PI.

Your price figures are off too. An i5 based compute node can be built for more like $500

Similarlly the real price of a Pi node is quite a bit more than $25. Firstly the Pi you can actually buy and would want for this task (clustering needs networking support) has a base price of $35 not $25. Secondly that price excludes things like the power power supply the SD card, the network cable and the mouning hardware. The real cost of a Pi node is probablly more like $50.

So the Pi is about 10 times lower per node than the i5

My overall conclusion is if compute power per dolar is your goal then a smamler number of i5s is a much better bet than a larger numer of Pis.

Actually, Just ran a test, because I was a little amazed that the ARM 6 was so much slower than the A9. Here are my numbers.
Parse Time: 0 hours 0 minutes 14 seconds (14 seconds)
Photon Time: 0 hours 5 minutes 43 seconds (343 seconds)
Render Time: 5 hours 58 minutes 53 seconds (21533 seconds)
Total Time: 6 hours 4 minutes 50 seconds (21890 seconds)
While the Raspberry Pi wasn't faster than the A9 (didn't expect it to be) it was way faster then ARM6 you tested on. Most likely due to the fact that it uses hard float.

sounds like the old mainframe vs x86 commodity server argument. small mobile tech will take the place of much of the current PC's because grandma doesn't need a PC to check email, facebook, play solitaire, and watch netflix, but the PC won't disappear just like the mainframe didn't disappear because it still has many more use's. in fact there are more mainframes sold per year now then there were in the pre-PC era. what we will see come from this will be a continuation of the trend toward less and less expen

A supercomputer is any overall system that's IO limited not CPU limited like most machines. At least at full theoretical CPU use. Hard to define a rasp pi as anything other than IO limited, so... An alternative def more popular recently is programmer limited as in its hard to parallelize some algorithms. Either way it fits.

While technically RAM is not part of the CPU itself, I think you'll find most people don't consider memory access when calling something "I/O limited". That's more along the lines of network ports, hard drive, USB, firewire, thunderbolt, etc.

Calling this thing a cluster.. fine.Calling it interesting for students to learn about how clusters work... fine.Calling it a supercomputer? Maybe if the University of Southampton got sucked into a time vortex to the early 1990's, and even then while the raw theoretical number crunching capability of the RPis would be impressive, the lackluster I/O and interconnects would mean that even supercomputers of that time would still win on many common workloads.

I love to see legos doing advanced things, but for a chassis? I feel like people can be very smart, but sometimes afraid to learn how to build something with their hands. The lab example I posted above is at Cambridge University. Cambridge has a very competent engineering department, why not reach out to them?- It could have made for an excellent project for some engineering students.

I love to see legos doing advanced things, but for a chassis? I feel like people can be very smart, but sometimes afraid to learn how to build something with their hands. The lab example I posted above is at Cambridge University. Cambridge has a very competent engineering department, why not reach out to them?- It could have made for an excellent project for some engineering students.

Aside from your Lego-assembling robot, Lego is always assembled by hand. It's also cheaper and faster to build a Lego chassis than get the engineers to weld up one from mild steel.

Plastic is always a bit problematic case material for larger-size electronics. It might build up some static electricity, and you can't create grounding and RF shielding. However I agree that LEGO is simply practical for sculpting quick prototypes.

I get 64 cores a hell of a lot more memory and storage in a single quad proc server. Does this make every new VM or DB server I buy a supercomputer? It's not even drawing as much power as this stack. Maybe there planning on using there undocumented GPU's I can throw a couple of those as well and still trounce this setup. Am I missing something? Besides the putting them together with legos with his I assume son.

Yes, you are missing something (though I have slight reservations about the 16 cores to a die CPUs you claim to be using). There's this thing called education...your large server running loads of VMs is not going to be nearly as useful or informative at getting the ideas across as a rig like this. There is a big difference between working with virtual networks and seeing the hardware of a real network, as well as being able to program the thing with "small" languages without monster frameworks just to make anything happen.

However, you do win a "Miserable git" award for being unpleasant about Prof. Cox.

You mean you have reservations about stock shipping AMD server procs? If you want education you want to be able to do things like artificially inflate the latency of the linking network that's easy to do on VM's. Test the effectiveness of different storage methods vs the type of workload. Looks at nodes with different processing capabilities. Honestly I find it amazing hard to fathom that it took a whole group of people to stack 64 SBC's load them with an OS and connect them up to a switch. This is a m

I'm a big fan of the RP project. But I'm a bit bored of seeing news items in which someone does something with this Linux box, which obviously a Linux box can do. Raspberry Pi compiles C! Raspberry Pi controls a robot! Raspberry Pi runs MAME! Well of course it does, it's a little PC, and that's what PCs can do.

You clearly need to turn in your Slashdot commenter license... to REALLY entice editors to post a story, work BitCoins into the mix. Oohh... better yet, work in references to the MPAA, And Ubuntu, and whatever else can be stirred into the pot. References to MAME are old school... (although that can be forgiven, Mr. 4-digit UID.)

How does this sound? "Raspberry Pi used to mine BitCoins to help pay an MPAA Lawsuit Fine. However, due to a security hole in Ubuntu caused by the new Unity interface, the new coins were stolen from the user by someone claiming to be affiliated with Anonymous. Wil Wheaton offers to sponsor a live D&D game played with Arduino-programmed robotic miniatures to make up for the lost funds."

Yep, you forgot about how kick-starter was used to fund the creation of the robotic miniatures. Also, the Raspberry Pi was actually running MineCraft which had a working implementation of an 8-bit processor that was doing the actual BitCoin mining. Researchers were observing the operation of the MineCraft processor using the Oculus Rift headset. In the future, the designers plan to port this all to the upcoming Ouya.

I think this is a great project for students, because it will
let them develop and test simulations and other
algorithms for parallel computing without
tying up expensive "real" supercomputers. A bonus is that the
relatively slow speed may encourage techniques to make
such computations more efficient, with a corresponding
payoff when the algorithm is put onto the real thing.

The meaning of PC was never really solid. It just means "Personal Computer". In this case I just meant "something with a CPU, RAM, a filesystem, keyboard, monitor and ethernet. (although the keyboard and monitor aren't relevant to this particular project).

PC = personal computer. Also, running Windows isn't theoretically impossible, just very very slow if via emulation x86 (not least of which is due low memory and swap thrashing). Or you could get the source code for the ARM version of Windows 8 (which I'm sure the academic person could get) and hack it to work too.

I would like to recommend the red and white suited astronaut lego people to maintain the server, or to work as sysadmins. They seem very dependable. If not them, then maybe the Lego people from the 70s that didn't have the smiley face painted on them. They seem more analytical and inclined to this type of work. Anybody remember them?

I just did the math. The Pi community supposedly recommends a minimum of 1A@5V if you intend on using any peripherals, including ethernet. 700mAh is the minimum draw with *nothing* connected.
5W x 64 = 320W. That's quite close to the max capacity of the power supply for the dual-socket machine I mentioned. The E5-2620 processors have a max TDP of 95W each.
Now, that doesn't count the auxiliaries - but there's still a 120W difference between typical power usage for the Pi, and MAXIMUM power usage for t

Mmm, but it *is* a nice environment for *students* to experiment with the *principles* of parallel computing in a tactile manner.

I began learning to code on an 8 bit 2Mhz CPU, with 32KB of RAM. If I wrote an inefficient loop, I'd often notice the slowness without benchmarking. If I was careless with memory, my program would crash. On my quad core laptop today, I only notice issues like that if I benchmark or do deliberate load testing. So working on low-spec systems is instructive.

Likewise, working with clusters of low-powered units on a slow comms bus is going to teach these students a lot about optimising parallel programs. They're going to have to deal with race conditions, memory ceilings, etc. which might not even show up on faster systems.

Exactly, whilst the system isn't powerful, it is instructive in cluster design and programming, which is very relevant at a university.

They won't be running "real applications doing real calculations" on this thing. They'll be writing student-level clustered applications. For the price paid, it's probably a really instructive system for the university to have installed, if they make use of it in student courses and/or projects.

There is questionable benefit to having that tactile experience a)extend beyond a few nodes - certainly nowhere near 64 b)have it on hardware which resembles embedded systems, not real compute nodes. In short, it's teaching 15 year old HPC technology in an era where you can fit 64 cores into 1U for a couple grand.

So.. You have a little server there. Good luck with using it for teaching a bunch of students about how scalable clustered software works, how to write the software, what are the pitfalls and else.

Good luck running 64 separate VMs on your small server (not saying it's not impossible but I really wonder which one is faster to set up) and you won't be able to test any of the very different interconnects that easily.

Good luck running 64 separate VMs on your small server (not saying it's not impossible but I really wonder which one is faster to set up) and you won't be able to test any of the very different interconnects that easily.

Very easy indeed, and almost certainly quicker/easier to set up than the physical way, either using something like Vagrant [vagrantup.com] or by rolling your own scripts to drive VirtualBox.

However, I think it's instructive for students to do it the physical way first. By analogy: first understand LANs, then learn about VLANs.

When I was in university, I took a parallel computing course and we used MPI, same as these guys. Back then, all the personal machines were single core. If we were lucky we could test the program out by remote logging into the quad processor SUN machine. Guess what? We were able to learn quite a bit just running 64 different processes on the same box, even with just a single processor core. It would have been nice to have a machine around with 64 actual cores on it to see how things worked one everythin

There is a whole lot of point missing going on here. Yes, you could build a faster computer cheaper using other hardware. But it wouldn't explain the concepts to children (and to first year CS students, which is pretty much the same thing) nearly so well. Throw together a heap of little itty-bitty boards each of which, individually is, as everyone knows, relatively low power, and knit them together with ordinary cat5 cable, and get out of the collection high compute performance, and you have something which will intrigue children|students and get them thinking about how it works. Show them an anonymous 1U box doing exactly the same job, and you won't get them thinking, because they can't immediately see and understand what it comprises and how it's put together. This is a teaching machine, not a practical machine. It's job is to teach students. It teaches students by being perspicuous.

It's not (yet) a requirement for getting a Slashdot account to demonstrate that you have an IQ slightly south of that of a stick of used chewing gum, but some of you clearly haven't yet got that message.

You're the one missing the point here. I can fit in 1U what used to take an entire rack.

When you can fit that kind of power into 1U, and given the massive leaps in computing power per core, traditional nodes-connected-by-networks clusters are applicable for far fewer people these days. What they should be teaching is proper multithreaded programming techniques.

Something that I don't thing got much play in the article is that each of the 64 Pi boards has a SOC processor that in addition to the general purpose processor also includes a 48 core processor optimized for graphics. And yes in http://www.raspberrypi.org/archives/1967 [raspberrypi.org] they note that there is already code that can use those processors for graphics. I have little doubt that someone looking at the code can port one of the gpu processing libraries to make use of these processors for other numerically intensiv

RS components have had serious delays. I cancelled my order with them and instead ordered from CPC [cpc.co.uk]. It took approximately 48 hours for RS to refund my money and almost exactly the same amount of time for CPC to deliver.

This story has built out of Legos, and Raspberry Pis, so it's definitely worthy for the slashdot front page. But it could be better, like they called the order in from their Nokia phone and paid for it using Bitcoins.

Quite seriously, I wondered about making a cluster of Pis to replace a desktop PC I have running in the loft. It really just runs some web servers, PHP, Mysql and a few other fiddly things. I wondered if I could potentially even dynamically boot up Pis to cover load (ie. spin up some extra web servers when load increases). My big problem is the DB though - I mainly use Drupal, so don't have separate read and write DB handles, so I can't scale MySQL horizontally. Also, the ethernet isn't very fast, so the in

How do you build a supercomputer out of processor modules that cannot reliably communicate with each other.
The ethernet connectivity of the pi is based on a small module that attaches to the USB.
I don't get it...

When it's absolutely necessary to differentiate, use "primary storage" (typically RAM) and "secondary storage" (typically hard disk). After all, VRAM is essentially just using a secondary storage device for primary storage. RAMDISK is essentially using a primary storage device as secondary storage. The lines get pretty blurry if you try to say a specific physical device is only used as primary or secondary storage.

It's field-specific. If it doesn't forget everything after power cycling, we in the computer business call that "storage."

In drag racing, quick and fast are different. In metallurgy, hardness and toughness are different. In typography, readability and legibility are different. In astronomy, revolve and rotate are different. Etc etc etc. Every field has terms like this and yes, it matters, and they should be used correctly when within that field. (Which we currently are: on a tech site, talking about technol

It's field-specific. If it doesn't forget everything after power cycling, we in the computer business call that "storage."

Er, in my CS textbooks (admittedly from the early 90s), memory and storage are interchangeable terms, and RAM counts as "storage" too.

In those books, RAM is "primary storage" and hard drives are "secondary storage".

By a different definition, coming from a different part of the computing field, primary storage is what you get from malloc() and secondary storage is what you get from fopen() -- but we all know how blurry *that* can get.

It varies, of course, but the most common current practice is "memory" (RAM) and "storage" (disk or solid-state -- the main thing is "long term" or non-volatile.). From the Wikipedia page on computer memory [wikipedia.org] (which the GP did not quote):

The term "storage" is often (but not always) used in separate computers of traditional secondary memory such as tape, magnetic disks and optical discs (CD-ROM and DVD-ROM). The term "memory" is often (but not always) associated with addressable semiconductor memory, i.e. integrated circuits consisting of silicon-based transistors, used for example as primary memory but also other purposes in computers and other digital electronic devices.