Oracle Blog

Josh Simons' Coordinates in the Blogosphere

Thursday Jan 14, 2010

The Sun Grid Engine team has just released the latest version of SGE, humbly called Sun Grid Engine 6.2 update 5. It's a yawner of a name for a release that actually contains some substantial new features and improvements to Sun's distributed resource management software, among them Hadoop integration, topology-aware scheduling at the node level (think NUMA), and improved cloud integration and power management capabilities.

Thursday Dec 18, 2008

The Sun Grid Engine team is looking for experienced SGE users interested in
taking their latest Update release for a test drive. The Update includes bug
fixes, but also some new features as well. Two features in particular caught my eye:
a new GUI-based installer and optimizations to support very large
Linux clusters (think TACC Ranger.)

Full details are below in the official call for beta testers. The beta program will run until
February 2nd, 2009. Look no further for something to do during the upcoming holiday
season.

Sun Grid Engine 6.2 Update 2 Beta (SGE 6.2u2beta) Program

This README contains important information about the targeted audience of
this beta release, new functionality, the duration of this SGE beta program
and your possibilities to get support and provide feedback.

Audience of this beta program

Duration of the beta program and release date

New functionality delivered with this release

Installing SGE 6.2u2beta in parallel to a production cluster

Beta program feedback and evaluation support

Audience of this beta program

This Beta is intended for users who already have experience with the Sun
Grid Engine software or DRM (Distributed Resource Management) systems of
other vendors. This beta adds new features to the SGE 6.2 software. Users
new to DRM systems or users who are seeking a production ready release
should use the Sun Grid Engine 6.2 Update 1 (SGE 6.2u1) release which is
available from here.

For the shipping SGE 6.2u1 release we are offering a free 30 day evaluation
email support.

Duration of the Beta program and release date

This beta program lasts until Monday, February 2, 2009. The final release of
Sun Grid Engine 6.2 Update 2 is planned for March 2009.

New functionality delivered with this release

Sun Grid Engine 6.2 Update 2 (SGE 6.2u2) is a feature update release for SGE
6.2 which adds the following new functionality to the product:

a GUI based installer helping new users to more easily install the
software. It complements the existing CLI based installation routine.

new support for 32-bit and 64-bit editions of Microsoft Windows Vista
(Enterprise and Ultimate Edition), Windows Server 2003R2 and Windows
Server 2008.

a client and server side Job Submission Verifier (JSV) allows an
administrator to control, enforce and adjust jobs requests, including
job rejection. JSV scripts can be written in any scripting language,
e.g. Unix shells, Perl or TCL.

consumable resource attributes can now be requested per job. This makes
resource requests for parallel jobs much easier to define, especially
when using slot ranges.

on Linux, the use of the 'jemalloc' malloc library improves performance
and reduces memory requirements.

the use of the poll(2) system call instead of select(2) on Linux
systems improves scalability of qmaster in extremely huge clusters.

Installing SGE 6.2u2 in parallel to a production cluster

Like with every SGE release it is safe to install multiple Grid Engine
clusters running multiple versions in parallel if all of the following
settings are different:

directory

ports (environment variables) for qmaster and execution
daemons

unique "cluster name" - from SGE 6.2 the cluster name is
appended to the name of the system wide startup scripts

group id range ("gid_range")

Starting with SGE 6.2 the Accounting and Reporting Console (ARCo)
accepts reporting data from multiple Sun Grid Engine clusters. Following
the installation directions for ARCo and using a unique cluster name for
this beta release there is no risk of losing or mixing reporting data from
multiple SGE clusters.

Beta Program Feedback and Evaluation Support

We welcome your feedback and questions on this Beta. Weask you to
restrict your questions to this Beta release only. If you need
general evaluation support for the Sun Grid Engine software
please subscribe to the free evaluation support by downloading and using the
shipping version of SGE 6.2 Update 1.

Wednesday May 14, 2008

The Open Source Grid and Cluster Conference is being held
this week in Oakland, California. I attended the first day of the conference
before flying home to meet a personal commitment. My favorite talk of the day was
Paul Brenner's presentation titled Grid Heating: Dynamic Thermal
Allocation via Grid Engine Tools.

Brenner, who works as a scientist in the University of Notre Dame's
Center for Research Computing, is exploring
innovative ways to exploit the waste heat generated by HPC and
other datacenters via partnerships with various municipal entities
in the South Bend area. His first prototype, currently
in progress, involves placing a rack of HPC compute nodes at a local
municipal greenhouse, the South Bend Greenhouse and Botanical
Garden.

The greenhouse had recently been forced to close portion of its facility
due to high natural gas heating costs. Brenner wondered if he could help.
Since current datacenters can be viewed as massive electricity-to-heat
converters (with a computational byproduct), it seemed there might be
an opportunity to exploit the waste heat in some useful way. But transferring
heat, especially low-grade waste heat, over distances is very inefficient.
Was there a way to overcome this barrier?

Enter grid computing with its ability to harness remotely located compute
resources. If Brenner couldn't transport the heat to the greenhouse, why
not place the datacenter at the greenhouse? The garden gets the heat and
Notre Dame gets the compute resources via established grid computing
capabilities like Sun's Grid Engine
distributed resource manager, which is already in use at Notre Dame. Cool idea? Hot idea!

Based on early prototype work which involves placing single rack in
the greenhouse, the idea looks like a promising way to reduce natural
gas heating requirements for the facility. Brenner has shown he can
use grid scheduling software to deliver a desired temperature (within
a range, of course) by simply adding or throttling compute jobs on
the greenhouse cluster, which communicates with Notre Dame via
a wide-area wireless broadband connection.

He has looked at humidity issues and so far they don't seem to be a
problem given the ranges supported by typical compute gear. And
he points out that while the greenhouse environment does not offer
the highly filtered environment of a controlled datacenter, the
particulate tolerance for typical compute gear
is far in excess of EPA guidelines for people.

Phase II will involve placing three full racks of gear at the greenhouse
to significantly reduce heating costs. Notre Dame will pay the electrical
costs and use the compute resources. The city saves money on heating.

While the greenhouse is an interesting experiment, it is not ideal since its
heating requirements will fluctuate seasonally. There are, however, other
installations that have constant heating requirements--for example,
hospitals have a 24x7 need for hot water. Sites like this could be
interesting for future deployments.

Monday May 05, 2008

I'm a bit late posting this, but did want to mention that the Peach open movie project recently released
Big Buck Bunny, a 3D animated movie
that was rendered on the Sun Grid Compute Utility at Network.com.
Details on the operation of the Peach render farm are here. You can also click on the diagram below for a closer look at the overall IT setup for the project.