Tuesday, November 30, 2010

Computing power: how much is enough?!

I see almost every day someone showing off with their new hardcore computer with lots of Gigahertz and lots of RAM, etc. but is that system fast enough to find the first 100 mil. or 1 billion prime numbers in under 10 minutes?! well... it depends on the algorithms and the system configuration.
Time showed us that there's never enough computing power(I'm NOT talking about browsing the Internet or writing a text file here...), but what can we do in order to achieve our goals using computers as fast as possible?! there are a few options(off the top of my head):

1. buy better computers
2. use any computer you can get you're hands on

1. We always buy better computers in order to do stuff faster but there are a lot of limitations:a.budget: we can buy STA(state of the art) computers with 4, 6, etc. cores that will make our life easier, but is this really a good idea?! the answer is NO, buying a i7 at 3 GHz with 4 cores it's about $ 3-400 depending in which country you live, now 3 Ghz with 4 cores is not the fastest you can get, Intel has way better CPU's than that -- extreme series, they also try to get as many cores as they can into a CPU but let's just stop at the extreme series which costs about $ 1.000/CPU(of course it worths the price, but it depends on your needs) -- now this is a lot just for a processor but depending on you're budget you can buy or skip.

b.operating system: some OS's are better than others -- depending on your needs of course -- let's take Windows for example, it is a very good OS for entertainment and office, but when you need to do some tasks that takes hours/days/weeks to complete is it good?! I honestly can't give a definitive answer on this because for tasks that needs a lot of time to complete I turn to my geek friend Linux -- it is very stable, it manages resources very well and if you don't need GUI(graphical user interface) it's pretty much rock-solid.

2. What do I mean by "use any computer you can get you're hands on"?!
It's not a secret that a lot of companies connect a bunch of computers together through a communication protocol and use each computer as a thread -- WAIT!! how does this work?!!
Basically it depends on the developers... you can have a system that is the Master on which you execute special programs and sends task execute request to 2 or more Slaves, when a slave completed it's task, it sends back the result to the master and waits for another request from the master -- pretty simple ey?! in essence yes, in practice NOT!!
Here is the basic idea:
step 1. Master => send request => slave(s)(1..N computers) -- usually at least 2!!
step 2. Master waits for all slaves to complete the tasks
step 3. when a slave completes the task it sends result back to the Master
step 4. Master processes result(s)
Fairly simplistic right?! but why do I say "at least 2 computers"?!
Over time we have been Witness hardware failure(I'm proud that I haven't had too many -- yet!!) let's say we got a highly intensive task that we believe that it will take "forever" to complete a matter of days, WHAT IF in this time one of the slaves has a hardware failure?! you've lost shit-load of time and we all know the equation:time = money -> lose time => lose money another way to see this is: the less time you spend on doing something, the more money you earn.
Sooo... let's review what is one of the best approaches you can take when you need huge computing power:
1. get as many systems as you can -- no matter how powerful the CPU is or how much RAM the system has
2. implement the logic and the communication protocol(avoid using hard disks as much as possible 4...N. always improve the idea!!

Now, let's try to throw some ideas of a possible implementation:
- create a flexible communication protocol(I prefer using TCP/IP because you can have GB's of data transfered in second(s)) maybe use XML?!
- choose the cleanest Linux distribution you can think of -- avoid using GUI for better performance(on slave side)
- implement integer(huge integers -- that can grow up to trillion digits long), string(huge strings that can be concated from 2 or more slaves), object(which has it's own methods which will be transfered along with it from master-slave, slave-master, slave-slave), etc.
- use some kind of ping mechanism so that the Master is automatically "knows" when a slave is dead and take appropriate actions(send task to another slave, e-mail tech department, etc.)
- Master CAN NOT execute task -- it needs only to assign tasks to slaves and communicate with them
- if you try hard enough you can also make the slaves "know" when the Master has a failure and another "free of task" slave can take it's place
- you will have to use a very fast interpreter

What do we get out of this?! well some of you know that you can buy good old Pentium 4 computers at 2.x-3 Ghz with 512 mb or 1 GB RAM for ~$ 100) -- WAIT!! so I can have 10 cores at $ 1.000?!?! yup...
You can also use implement this in such a away that you can use virtually any OS -- YES you can have 2 slaves on Windows 2000, 5 slaves on Windows XP, 20 slaves on Linux, 8 slaves on OSX, etc.
Sooo... the "hardcore" system can have a lot of slaves, running on multiple platforms AND you can always ADD more slaves on the network, OK but where's the drawback, I know there must be at least one -- yes there are plenty, but it basically depends on the developer(s):
- the system can take anywhere between a few seconds to a few minutes(depending on the initialization implementation -- needs to be ran at the beginning of the program execution) -- this can be tunned!!
- you will have to take care of the synchronization -- it's normal in a multithreaded environment
- if master dies the whole program progress can be lost -- this depends entirely on the implementation of the "main executor" or Mr. X ;-)
- you also need to take into consideration each system's configuration -- depending on this you can execute small tasks on Pentium 3 systems and others on P4 or i3/5/7's

As you can see the most important piece of the puzzle is the developer's skills.

But sometimes you need tens of thousands of computers -- WHAT can you do then?!
We all know that there are hundreds of millions of computers out there that are used only for Internet browsing, multimedia download, how can we use that to our advantage?! well a lot of hackers and companies uses/d zombie computers by uploading torrent clients and or multimedia programs for users to freely download and use, but while a lot of computers spend hours a day just downloading, the CPU and a lot of memory is available to be freely used legally or illegally depending on the EULA they provided with the software.
Take Skype for example, it uses your CPU and bandwidth in order to provide you with "free" service:

4.1 Permission to utilize Your computer. In order to receive the benefits provided by the Skype Software, You hereby grant permission for the Skype Software to utilize the processor and bandwidth of Your computer for the limited purpose of facilitating the communication between Skype Software users.

4.2 Protection of Your computer (resources). You understand that the Skype Software will use its commercially reasonable efforts to protect the privacy and integrity of Your computer resources and Your communication, however, You acknowledge and agree that Skype cannot give any warranties in this respect.

You hereby grant permission for the Skype Software to utilize the processor and bandwidth of Your computer for the limited purpose of facilitating the communication between Skype Software users.

This is a legal way of using your system, however others are JUST using your system because you got some illegal software from a torrent or warez website and you can't really complain about this in court, if you know what I mean -- it's your full responsibility.

As a Delphi/Pascal developer, what can you use in order to target as many platforms as you can and implement this? HELLO?!?!Freepascal and Lazarus is a good starting point and DO NOT forget that as a developer you should NOT be limited to a single programming language, you can also use C++ and/or Java as well if you implement your protocol flexible enough!!

3 comments:

Hello, Dorin. Nice insight, but I can't help point out some things I don't completely agree with.

First off, history taught me that there's no such thing as "enough computing power"; the closest thing to it would be "enough computing power for now". I'm still pumping loads of money to get my hardware stay in line with my expectations and, believe me, I'm not a rocket scientist or something like that.

The second thing I'd like to discuss is the fact that bringing computers to work together is a real challenge. I've done this for various purposes during my career, including building image rendering farms (for 3D Studio Max and the like) and massive data processing clusters in a research project on malware applications. Hard disk drives may be slow, but they're not irreplaceable: I've successfully used solid-state drives connected through SATA-3 controllers and worked like a charm. Before SSDs became so cheap, I was using RAM-disks - a more expensive approach, but you need to invest in order to get the best bang, you know. For me, the biggest bottleneck was network communication, although I used 1Gbps equipment for interconnections. Fiber channels between NICs seem to be a little too radical for me, not to mention that they're hyper-expensive.

Another big issue I hit was keeping the systems in sync, while also increasing redundancy. Using appropriate RAID setups I managed to fight off the possibility of losing data when drones eventually gave their last breath under stress.

It might be true that Pentium cores are extremely cheap (especially the Prescott bread which comes with hyper-threading and add extra power to the grid), but you're gonna invest MUCH MORE in the electricity bill and conditioning. Remember that Pentium cores eat up a lot of electricity and they're not quite efficient: they warm up and need extra ventilation, which brings us again to shedding money. I'd personally go for an Intel Core2Duo chip or the slightly more expensive Opteron CPUs from AMD.

We've got used to imagining computers each time we mention computing power. Clusters and grids have been around for some time, gained popularity and people liked the idea. Well, they may NOT be the best pick for delivering tremendous amounts of power, especially when dealing with floating point operations. I think that both the IBM Cell CPUs in PlayStation consoles as well as nVidia's new video GPUs can deliver more than a simple processor can. Haven't tried them yet, but they may be a much simpler solution.

And although I'm a huge fan of Delphi, I don't think that it would scale to the size of a grid. Why? That's simple. As a 32bit-only language, an application written under it won't be able to manage the amount of memory necessary for computing-intensive tasks. I'd personally choose something lighter and more flexble. Python is the first thing that comes to mind.

hello Bogdan, thank you for your comments.I agree with you, however, depending on what you try to achieve and your budget, you can go for the nVidia GPU's with CUDA technology but those eat a lot of electricity power also(moreover your limited to 1, 2 GPU's/PC, depending on the motherboard), if you really want to be "green" then go for the i3/i5 if you can allocate ~$300-350/piece(if you buy a bunch).I totally agree with Delphi not being good, however there is "free pascal compiler"(closest to Delphi language) which can target 64 bit platform, therefore that shouldn't be a problem.P.S. you might wanna take a look at XtremeOS(a linux distribution).