Service

Components of LLVIEW

LLview is a client-server based application. The client LLview is the graphical user interface which displays the utilization of a batch system controlled cluster.
The server application called LML_da is run on the login node of the target system. It triggers batch system commands
and parses their output in order to generate an XML file containing a snapshot of the current system status.
This XML file is transferred to the LLview client, which interprets and renders the status data. Depending
on the target system different adapter scripts are used to parse the output of the batch system commands.
There is already support for Grid Engine, Loadleveler, LSF, Moab, PBS, TORQUE and SLURM. Each of these
adapters reads outputs from batch system commands and derives an LML file from the combination of job
and node status data. The gathered information of the parallel system is converted into the
Large-scale system Markup Language (LML),
which defines the structure for XML files describing the current status of a supercomputer.
LML functions as interface between LML_da and the client application. As a result, both components are
independent from each other, which simplifies their extension and improves modularity.

Components and data flow of LLview

Client llview

The main part of the LLview client is the node display. It shows a small box for each processor of an SMP node.
The box color codes the job running on this processor. Furthermore, the node display contains
additional elements displaying global information about each node such as status, memory usage
and cpu load. When moving the mouse pointer over a processor box, the corresponding information is highlighted in
the other display elements of LLview. These elements are job list, usage bar,
information panel, three-day history view, scheduling prediction and a utilization chart.
See also: -> Using the LLview client

Server LML_da

The server application is composed by a set of independent modules working on LML. Each target system
adapter calls the particular batch system commands for retrieving currently running and submitted jobs
as well as the states for the compute resources. E.g. on a TORQUE system the qstat command
is executed and parsed for job status data. For node data pbsnodes is triggered. The
data is converted into the uniform format LML, which is further processed in subsequent modular steps.
There is a module for merging multiple LML files (for instance one job data and one node data file),
a module for adding unique colors to each job, a schedule prediction module and a history manager.
In earlier version of LLview an implicitly defined XML format was used for the data transfer. The LLview
client is still working on that data format, so that the additional module lml2llview is necessary
in order to translate LML data back to the old data format.

Data Flow

There are three different modes in which LLview can access the data. In every case the data will
be generated by the server program LML_da.

The client part of LLview can access the data directly if the client runs on the same machine
or LLview is configured to use a ssh-connection to the remote machine. In this mode LLview executes
the LML_da workflow at every update step triggered by each client separately.

The usual way is to distribute the data via a web server to support clients running
on local desktops. In this case, LLview accesses the data from the web server with a user/password
authentication method. The XML file on the web server is updated in a regular interval.

Additionally, LLview provides a mechanism to record data and replay recent usage statistics.
Therefore LLview is able to read a tar file which contains the XML files. LLview can also read
flat XML files from a directory.