Main Menu

Project Documentation

About Opal Toolkit

The Grid-based infrastructure enables large-scale scientific applications
to be run on distributed resources and coupled in innovative ways. However,
in practice, Grid resources are not very easy to use for the end-users who
have to learn how to generate security credentials, stage inputs and
outputs, access Grid-based schedulers, and install complex client software.
There is an imminent need to provide transparent access to these resources
so that the end-users are shielded from the complicated details, and free
to concentrate on their domain science. Scientific applications wrapped as
Web services alleviate some of these problems by hiding the complexities of
the back-end security and computational infrastructure, only exposing a
simple SOAP API that can be accessed programmatically by
application-specific user interfaces. However, writing the application
services that access Grid resources can be quite complicated, especially if
it has to be replicated for every application. Towards that end, we have
implemented Opal, which is a toolkit for wrapping scientific applications
as Web services in a matter of hours, providing features such as
scheduling, standards-based Grid security and data management in an
easy-to-use and configurable manner.

News

Why Opal?

The salient features of Opal are as follows:

Opal enables deployment of scientific applications as Web
services without having to write a single line of Web services code

Opal hides the complexity involved in the submission of
computational jobs to Grid resources

Opal manages user data, which includes creation of working
directories, input and output data staging, and persistent storage
for job information and metadata

Opal services can be configured with GSI-based security for
authentication and authorization purposes

Opal services can be accessed from a multitude of languages
(Java, Python, Perl, JavaScript) and plaforms (Windows, various
Unix flavors)

So why use Opal, and not just Globus GRAM to launch remote jobs? Here are some reasons.

Deploying an application as an Opal service is very easy, and can be
achieved under a couple of hours. It can often be done much faster
than that, once the first Opal service has already been deployed.
After the necessary software has been downloaded and installed,
adding a new service is a matter of modifying a few configuration
files and using an Ant script to deploy the service.

Every user doesn't have to deploy the application. From our
experience, we have learnt that deploying a scientific application can
be quite complicated if it has to be done by every user. If Opal is
used, the service provider deploys this application once which can then
be used by any client via a SOAP API.

Every user would typically need an account on the cluster if they use
the traditional Globus GRAM approach. In theory, multiple client DN's
could be mapped to a generic group user account - but this means that
all the users have to ensure that they don't interfere with others
who may be logged on to the same account. The Opal approach is much
cleaner - only authorized users are allowed to run jobs using
GSI-based transport level mechanisms. However, since they are not
allowed to run *any* arbitrary command, they don't interfere with one
another. Furthermore, it is easier to keep track of user requests
this way because every single user can be accounted for (unlike the
former where only the users are only accounted for as a single
group).

Users don't have to do their own data management. Using the
traditional method, every user would have to stage input and output
files manually. Furthermore, they would have to create new working
directories for every single run (so that output files from older
runs are not overwritten). On the other hand, Opal performs the data
management for the user. It creates new working directories
automatically for every run, and returns URLs to the user to retrieve
the outputs when the execution is complete.

Users don't have to be concerned with the schedulers being used at
the back-end. The service is configured to use a scheduler supported
by Globus (e.g. Condor, SGE) - the users are oblivious to this. Services
also be configured to use other back-ends such as DRMAA. In
the traditional approach, users would have to submit to a particular
scheduler, and be mindful of what back-end is being used.

Since the applications are exposed via a SOAP API, clients can be
easily written in a variety of languages, and accessed from different
platforms. Clients are shielded from any changes that happen at the
backend (upgrades, etc) as long as the SOAP APIs and the URLs for
connecting to the services stay the same. Currently, we have Java
clients used in Gridsphere-based portals, Javascript clients used in
the Mozilla-based Gemstone framework, and Python clients used in the
PMV toolkit. Furthermore, workflow toolkits like Kepler can be used
to orchestrate complex scientific pipelines based on Opal Web
services.