/var/opinion - Parallel Is Coming into Its Own

Changing my mind about distributed computing made me aware of the sweet aroma of opportunity.

I started writing about computing back in the 1980s. I don't want
to say which year, or do the math for how long I've been doing this.
It makes me feel old.

I've made a plethora of predictions since then. Some of them left me
red-faced and embarrassed. Some of them were spot-on. Some of them have
not yet been fulfilled, but I still think my predictions are on target.

One of my earliest predictions was rather easy, but it was considered
controversial back in the 1980s. I said that it was only a matter of time
before we bumped up against the limits of Moore's Law, and the only
viable answer would be parallel processing. Lo and behold, dual-core
processors are now common, and it won't be long before we see quad-core
processors, and the multicore cell processor in the PlayStation 3 is
around the corner.

Naturally, the next logical step is clustering or other means of
distributed processing. Here's where I begin to get nervous. When “grid
computing” became a buzzword, my knee-jerk reaction was,
“no, thanks”.
I don't work in a company office anymore, but if I did, I wouldn't want
the company off-loading processing to my desktop workstation unless
I was certain that everything ran in a completely isolated sandbox.
Put the grid processes in a chroot environment on Linux, for example.
Even then, I'm not sure I'd be happy about the idea. What if I want to
do something compute-intensive, and the grid process decides it wants
my CPU cycles more than I do? This isn't supposed to happen, but since
it's all in the hands of some administrator with his or her own agenda, why
should I trust that it won't happen?

It's the lack of control and fear of security breaches that make me
nervous. I've got four computers in my home that nobody ever turns off,
and two more for special purposes that I turn on as needed. The two
hand-me-down computers my kids use sit idle much of the time, unless my
daughter is browsing the Web, or my son is playing World of
Warcraft.
I use a server as a centralized provider of resources such as printers,
files and e-mail. It's a very old machine, but it never breaks a sweat
given its purpose. All this represents a tremendous amount of wasted
processing power. I'd love to tap in to that unused power at home.
This is a safe environment, because I'm not talking about exposing
my processing power to everyone on the Internet. I'm talking about
distributing workloads across local machines.

In principle, however, Sun was right all along when it said, “the network
is the computer”. Other companies, such as IBM, worked along the same
lines before Sun did, but I don't know of any company that said it better
than Sun. “The network is the computer” is a powerful phrase. As long
as there is adequate security built in to every aspect of distributed
processing, it makes perfect sense to provide common services
as remote procedure calls and distribute every conceivable workload
across as many computers as you want to make available to the system.
If someone could make me feel comfortable about security and control,
I'd buy into distributed processing in a big way.

Here are the challenges as I see them. First, there's the problem of
heterogeneous platforms. How do you distribute a workload across machines
with different processors and different operating systems? ProActive is
one of several good platform-agnostic distributed computing platforms
(see www-sop.inria.fr/oasis/ProActive). It is 100% pure Java,
so it runs on any platform that supports Java. It has a great graphical
interface that lets you manage the way you distribute the load of a job.
You can literally drag a process from one computer and drop it onto
another.

The problem is that a tool like ProActive doesn't lend itself to the
way I want to distribute computing. I want it to be as transparent as
plugging a dual-core processor in to my machine. Unfortunately, you can't
get this kind of transparency even if you run Linux on all your boxes.
The closest thing to it that I can think of is distcc, which lets you
distribute the workload when you compile programs. Even this requires
you to have the same version of compiler (and perhaps some other tools)
on all your boxes. If you want this to be a no-brainer, you pretty much
have to install the same distro of Linux on all your machines.

The bottom line here is that I smell an opportunity for Linux.
I would love to see a project that makes distributed computing on Linux
brainlessly transparent and distribution-agnostic. I'm talking about the
ability to start up any computation-intensive application and have it
automatically distribute the work across other machines on the network
configured to accept the role as yet another “processor
core”. You can
make this transparent to the application by building it into the core
user-space APIs. You manage it like you would any other network service.
Is this too pie in the sky? I'd love to hear your opinions.

Nicholas Petreley is Editor in Chief of Linux Journal and a former programmer, teacher, analyst and
consultant who has been working with and writing about Linux for more
than ten years.

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.