LogService

The DIET platform can be monitored using a system called LogService.
This monitoring service offers the capability to be aware of information that
you want to relay from the platform. As shown in
Figure 11.1, LogService is composed of three modules:
LogComponent, LogCentral and LogTool.

Figure 11.1:
DIET and LogService.

-

A LogComponent is attached to a component and relays
information and messages to LogCentral. LogComponents are typically used
within components one wants to monitor.

-

LogCentral collects messages received from
LogComponents, then LogCentral stores or sends these
messages to LogTools.

-

LogTools connect themselves to LogCentral and wait
for messages. LogTools are typically used within monitoring tools.

The main interest in LogService is that information is collected by a central
point LogCentral that receives logEvents from
LogComponents that are attached to DIET elements (MA, LA and
SeD). LogCentral offers the possibility to re-send this information to
several tools (LogTools) that are responsible for analysing these
message and offering comprehensive information to the user.

LogService defines and implements several functionalities:

Filtering mechanisms

As few messages as possible should be sent to
minimize network traffic. With respect to the three-tier model, the
communications between applications (e.g., LogComponent) and the collector
(e.g., LogCentral), as well as between the collector and the monitoring tools
(e.g., LogTools), should be minimized. When a LogTool registers with the
LogCentral, it also registers a filter defining which messages are required
by the tool.

Message ordering

Event ordering is another important feature of a
monitoring system. LogService handles this problem by the introduction of a
global time line. At generation each message receives a time-stamp. The
problem that can occur is that the system time can be different on each
host. LogService measures this difference internally and corrects the
time-stamps of incoming messages accordingly. The time difference is
correcting by using a time difference measurement recorded during the last
ping that LogCentral has sent to the LogComponent (pings are sent
periodically to verify the ``aliveness'' of the LogComponent).

However, incoming messages are still unsorted. Thus, the messages are
buffered for a short period of time in order to deliver a sorted stream of
messages to the tools. Messages that arrive out of order within this time
are sorted in the buffer and can thus be properly delivered. Although this
induces a delivery-delay for messages, this mechanism guarantees the proper
ordering of messages within a certain tolerance. As tools should not rely
on true real-time delivery of messages, this short delay is acceptable.

The System State Problem

A problem that arises in distributed
environments is the state of the application. This state may for example
contain information on connected servers, their relationships, the active
tasks and many other pieces of information that depend on the application.
The system state can be constructed from all events that occurred in the
application. Some tools rely on this state to work properly.

The problem emerges if those specific tools do not receive all messages.
This might occur as tools can connect to the monitor after the application
has been started. In fact, this is quite probable as the lifetime of the
distributed application can be much longer than the lifetime of a tool.

As a consequence, the system state must be maintained and stored. In order
to maintain a system state in a general way, LogService does not store the
system state itself, but all messages which are required to construct it.
Those messages are identified by their tag and stored in a special list.
This list is forwarded to each tool that connects. For the tool this
process is transparent, since it simply receives a number of messages that
represent the state of the application.

In order to further refine this concept, the list of important messages can
also be cleaned up by LogService. This is necessary as components may
connect and disconnect at runtime. After a disconnection of a component the
respective information is no longer relevant for the system state.
Therefore, all messages which originated at this component can be removed
from the list. They have become obsolete due to the disconnection of the
component and can be safely deleted in order to reduce the length of the
list of important messages to a minimum.

All DIET components implement the LogComponent interface. By using
LogCentral, the DIET architecture is able to relay information to LogCentral,
and then it is possible to connect to LogCentral by using a LogTool to
collect, store and analyse this information. LogService is available for
download. See the web page http://graal.ens-lyon.fr/DIET/logservice.htmlfor more information.