a bug corrected on server side concerning downloading data : if data file is not available on server side, distributed parts are now cleanly informed

a bug corrected on server side concerning work alive signal (on finished tasks)

a bug corrected in install packages concerning “adduser” usage

xwversion returns 1 if client should be upgraded

a bug found in data management (WIN32 worker)

xwconfigure introduces –newkeystore and –newalias command line parameters. By default keystores are created and unmodified if already exist.

–newkeystore can be used to force keystores regeneration (if already exist). Doing so cancels any deployed paltform : clients and workers must be redeployed

–newalias can be used to insert a new alias in existing keystores.
This helps to keep a deployment before keystores expire (this can then be used in conjunction with keystore.uri variable in xtremweb.server.conf)

The platform is now able to update the server public key. This works if and only if the server key has not expired yet.

Purpose : keep a deployment alive even if the server keys expires

What: distribute new server public key (automatically) to workers and (on demand) to client

When: before the actual server key expires

How :

use the script xwhepgenkeys to add new key to keystores

insert worker keystore (containing both actual and new server public keys) as public data in XWHEP server

stop the server

set keystore.uri variable in xtremweb.server.conf

restart the server

Security : This does not break security since

workers and clients must have the current server public key to connect to server

updated keystore is then distributed through encrypted communication channel

workers and client must have valid credentials to connect to server

workers and client safely keep keystores

distributed keytore contains server public keys only

XML un/marshalling has been improved. This results to an 20% performances increase as shown in the two next figures which represent the client execution time to retreive 1000 jobs from server from our XWHEP deployment at LAL. In the first figure we can see that the client needed more than 100 seconds using XWHEP 5.8.0; in the second figures less than 80 seconds is now needed using XWHEP 5.9.0.

server : the scheduler can be defined in conf file and dynamically loaded at run time

the scheduler has been totaly rewritten : it was not only buggy and conceptually erroneus, but also not fair. In this version, the scheduler :

does not cached DB rows any more since this was source to memory leaks

improves SQL request usage so that it retreives expected rows only

is now entirely responsible for task management

retreive() retreives WAITING task from DB and set them to PENDING

select() retreives PENDING task on worker request

but still does not fairly manages jobs among users

if you delete an user, you can reuse its login when inserting a new user

if you delete an user group, you can reuse its label when inserting a new group

user passwords are never sent over the network

processor X86_64 correctly manage for Apple workers

memory leaks resolved on server side. Next figures show server memory consumption. The 1st one shows version 5.7.7 starving behaviour: after two days, the amount of allocated memory is as up as 600Mb. The 2nd one shows a more stable consumption with version 5.8.0, after two days : the amount have been very stable at only 288Mb.

New features.

Some new features are implemented to improve confinement and security access.

user rights slightly modified : ADVANCED_USER allows user group management. Consequences : for each user group, we must define user group administrator who has enough rights to manage its own group (users, applications)

Any object can now be confined : they all have OWNERUID and ACCESSRIGHTS (AR) fields

xwchmod now applies to any object

Major consequences :

administrator and worker identities are private (AR : 0x700) : they are not listed by standard users

we can confine user group by setting its AR to 0x750

if a group is confined, an administrator of the group MUST be inserted

any user inserted in a user group inherits AR from its group

users MUST be inserted in a group by the group administrator, if any (the user owner MUST be the group administrator, if any)

users in a confined group are listable by user group members only

the client can help to insert an user group and its administrator by:$> xwsendusergroup MyGroupLabel theGroupAdminLogin theGroupAdminPassword theGroupAdminEmail

If user group is not confined, everything is as it used to be in previous versions.

This version enables versionning. Here is how. All messages (requests, answers) are now encapsulated in an XML root element ...
So that version is included in the protocol.
In top of that some fields have been deleted, added, renamed
(see -B.2- and -B.3-)
All Object now have OWNERUID and ACCESSRIGHTS fields
Some object had CLIENTUID or USERUID; these have been renamed OWNERUID

But, of course, versions prior to 5.8.0 know nothing neither about that XML root element, nor about column changes. This current version takes care of all that. If a distributed part sends a message without the XML root element, this clearly mean that this part run a version prior to 5.8.0.
Hence we don’t encapsulate the answer with the XML root element
and we take care to answer with the expected attributes.

XWHEP 5.7.5 introduced the notion of limit entry in cache (set to 10K entries), but the implementation was buggy : this is corrected. The cache now manages its entries implementing least recently used (LRU) management

server : detecting lost task is not done every 15s (which is just absolutly not necessary), but every ALIVETIMEOUT only (default : 5mn). This greatly reduces CPU consumption.

a bug corrected on accesslogger : it only opens one file (and not one per thread)

Launcher improved : it does not start if it has a wrong SSL key, or if URL is malformed

Launcher correction error on library path

worker improved : error messages are more detailed

worker corrected : on network failure when uploading result, results were lost and the worker where unable to request any new job. The worker now correctly recover and retry to upload results as well as to get a new job.

xwconfigure path bugs corrected

xwhep.bridge.pl modified: it can now connect to the DB in order to gain performances

xtremweb.gmond.pl : MySQL connections management corrected

a bug resolved on server side regarding MySQL connections usage which lead to too many TIME_WAIT sockets

a bug resolved on I/O library usage : all handlers are now correctly closed on exceptions/errors

a bug resolved in client cache : cache can not exceed 10K entries

scripts improved : it is preferable to use ‘type’ and not ‘which’ in bash/sh scripts

on client side, xwchmod now accepts UID too (and not only URI)

build corrected : it now correctly generates x86_64 worker

we don’t use String.intern() any more because we had some “OutOfMemory in PermGen” exceptions, on server side (see http://www.codeinstructions.com/2009/01/busting-javalangstringintern-myths.html)

admin and worker are not inserted in any group (because this is not used and may be confusing)

configuration variables are now all trimmed (leading and trailing whitespaces are removed)

configuration variable KeyStore was not correctly initialized KeyStore can either (and now correctly) be read from config file or from java parameter -Djavax.net.ssl.keyStore

a bug corrected in MySQL error handling on XWHEP server side

the usage of job group has been corrected

a bug corrected on the scheduler

X509 proxy certificate usage modified accordingly to EDGeS meeting (Jun 2009).
The client can now automatically checks if $X509_USER_PROXY environment variable is set so that the XWHEP “–xwcert” is not required any more.
If using “–xwcert”, the certificate must be valid otherwise the job is not sent.
If $X509_USER_PROXY certificate is not valid, the job is sent without any certificate. Any worker can download a job that has been submitted with an X509 proxy (and not only Pilot Jobs).
Pilot Jobs are private workers: they can run any job of their owner.

X509 proxy are never downloaded by workers (even Pilot Job).
The bridge may download X509 proxies (and the bridge only).

A bug corrected on communications handler to integrate new protocol (here ADICS)

New feature

XWHEP now integrates ADICS protocol by Cardiff University.
See http://www.p2p-adics.org.
This is an “external” new feature :

Note : this version is noted 5.7.1 and not 5.6.4 because 5.6.3 introduced a new feature and should have been noticed 5.7.0.
I correct this now only.

Corrections

xwconfigure now loops on each variable until a good value is provided

Tar file creation error corrected:
*ps files were excluded from tar file which was an error because
xwapps, xwgroups etc were excluded from tar file

Mac OS X worker scripts bug solved

Mac OS X installers errors solved. These were due to the fact that NetInfo is not supported since Mac OS 10.5. Directory Service tools usage now replaces NetInfo ones in our scripts. This is 10.4 and 10.5 compliant. Tested on 10.4.11 and 10.5.7.

To avoid confusion between XtremWeb and XWHEP, that last now installs everything in /opt/XWHEP-server, /opt/XWHEP-worker and /opt/XWHEP-bridge. XWHEP packages (Debian, RedHat, Apple) install XWHEP-something packages and not xtremweb-something package. All previously existing files (scripts, configuration files etc.) keep their “old” names (xtremweb.something) to not disturb those who have already developed over XWHEP.

default socket timeout set to 0

HTTPLauncher, which try to find last JAR version and start the worker has been corrected: it now uses the last JAR on local FS; it now stops immediatly if it can’t write downloaded JAR to local FS ;it has been reported that “-server” java option is not available on all JVM; HTTPLauncher then first tries to launch the worker with that option. On error, it retries without that option.

On server side, inter threads deadlocks corrected on communication layer that lead to “unreachable” or “timed out” communication errors on workers and clients sides.

A bug corrected on server side : it was impossible to reuse an application name, even if the application was deleted

The worker now stores its own UID to its config file so that it does not appear several times in server DB

installer corrected (RPM, DPKG, Apple PKG) because FS tree must belong to worker so that it can write downloaded JAR, if any

New feature

The worker can now manage dynamically linked application (and not statically ones only)

Next figures show server memory consumption. The 1st one shows version 5.4.0 starving behaviour: after four days, the amount of allocated memory is as up as 382Mb. The 2nd one shows a more stable consumption with version 5.5.0, after six days : the amount is a third of 5.4.0 memory usage.

this does not introduce security hole since communications are encrypted; it is the user responsability to ensure config file security;

this lightens compilation which does not require SQL access any more.

a bug has been corrected on worker side, regarding data download;

notions of groups and sesssions are (re)introduced.
Groups and sessions aggragate jobs.
Sessions are automatically removed on client disconnection (client disconnects at shut down or user switch);

there is a bug on the client GUI : deleting and downloading several rows is now disabled. This is due to a bug in table sorter; we don’t correct that since Java 6 introduces native table sorters. This will then be corrected when our package will be ported to Java 6;

our package is now Java 6 (even if we don’t use Java 6 specific features -see above) and 64Bits compatible.

in the configuration file, the SLKeystore variable can now contain a relatif path;

resource owner can open http://localhost:4324 to configure their worker;

cache management improved and lightly modified :

in general, informations stored in cache are not downloaded from server. There are three exceptions: works, tasks and hosts are always redownloaded from server since these informations are subect to change often;

client keeps its cache from run to run. A new command is introduced (xwclean) to clean client cache; this command is also available in the Comm menu of the GUI;

the worker cleans its local disk on shut down. Hence, the worker does not keep cache from run to run.

a bug solved on server side : memory consumption more stable.

The JAR file is now also provided. To update you have to

copy the JAR file in lib directory and restart your server

copy the JAR file where launcher.url, in worker config file, points to; on next reboot workers will automatically download it.

Next figures show server memory consumption. The 1st one shows a starving behaviour: after 1H30 only, the amount of allocated memory is as up as 78Mb. The 2nd one shows a more stable consumption : the amount is still the initial one at only 31Mb after two hours.

synchronization improved: each message now expects an answer from server;

performance degradation solved on server side.

The two next figures show 1000 submissions received by server. We can see the performance degradation on the first one; the 2nde figure shows that degradation is now solved.
Total execution time is 2.5 times higher because messages now expect an answer from server : this increases synchronization.

the scheduler has been modified to improve performances. It is not a simple round-robin any longer : it now searches the full task set to try to find a task that could fit worker needs;

the autotest script now submits group tasks too.

The following SQL command shows result more readably
select apps.name as app,label,
hex(works.accessrights),hex(apps.accessrights),
works.status,users.login as worker_login,
users.rights as worker_level
from
works,apps,tasks,hosts,users
where
works.appuid=apps.uid and
works.uid=tasks.uid and
tasks.hostuid=hosts.uid and
hosts.owneruid=users.uid
order by works.label;

We can see that public worker (which login is worker) has run public jobs only (which labels are public…); private workers (which logins are user…) has run jobs of their own identity only (which labels end by their own login).

The server is now certified by an autosigned SSL key which must be generated by createKeys. Next version will use X.509 certificate certified by a CA.

Installation and deployment needs following actions (in that order : createKeys must be executed before install).
$> make removeDB
$> make installDB
$> make clean
$> make
$> make createKeys
$> make install

For a production deployment, keys must be safelly stored, otherwise (if you lose or accidentally regenerate keys) a full re-deployment is necessary.

Electronic key usage has a cost in terms of communication.
Figure on the left shows the necessary TCP packet amount without SSL. Figure on the right shows the one with SSL : the packet amount is as twice.

A script to auto test the platform is now provided in bin repertory:
$> xtremweb.tests.pl

You must have the platform privileged rights to run the script (as provided by the default client config file).

The script does the following:

insert a new public application

insert two new user groups

insert 6 new users : two users per group and two users with no group

insert a private application per user

insert 12 jobs : on private and one public per user

launch one public and 6 private workers on local host

jobs monitoring

At the end of the script, we can see that all jobs are COMPLETED.
We can verify this with the following SQL command (we can’t check this with the client because the client does not show worker identity for each job) :
select works.status,works.label,hosts.name,users.login
from users,works,tasks,hosts
where tasks.uid=works.uid
and tasks.hostuid=hosts.uid and
hosts.owneruid=users.uid
order by users.login;

Next figure shows auto test results.

We can see that public worker (which login is worker) has run public jobs only (which labels are public…); private workers (which logins are user…) has run jobs of their own identity only (which labels end by their own login).

XML un/marshalling has been improved. This results to an 20% performances increase as shown in the two next figures which represent the client execution time to retreive 1000 jobs from server of our XWHEP deployment at LAL. In the first figure we can see that the client needed more than 100 seconds using XWHEP 5.8.0; in the second figures less than 80 seconds is now needed using XWHEP 5.9.0.

La commande SQL suivante permet de mieux voir les résultats
select apps.name as app,label,
hex(works.accessrights),hex(apps.accessrights),
works.status,users.login as worker_login,
users.rights as worker_level
from
works,apps,tasks,hosts,users
where
works.appuid=apps.uid and
works.uid=tasks.uid and
tasks.hostuid=hosts.uid and
hosts.owneruid=users.uid
order by works.label;