The Cassandra PV Archiver server is the central component of the archiving
system.
It is responsible for monitoring process variables (channels in the
terminology of the Cassandra PV Archiver) for changes and writing these
changes to the archive.
At the same time, it is also responsible for providing access to the data
stored in the archive through a web-service interface.
This chapter explains how to install, configure, and use the Cassandra PV
Archiver server.

1. Prerequisites

The Cassandra PV Archiver server is a pure Java application.
This means that it can run on any platform providing the Java 7 Standard
Edition or a newer version of the Java runtime environment (JRE).
Even though the JRE is sufficient for running the Cassandra PV Archiver
server, users are encouraged to install the Java Development Kit (JDK)
because of the additional diagnostics tools it provides.

The Cassandra PV Archiver server has been tested on Linux, OS X, and
Windows.
On some of these platforms, it might make use of the
JNA library
for accessing platform-specific functions.
However, the availability of these functions is not critical for the
operation of the Cassandra PV Archiver server.

In addition to the JRE or JDK, an
Apache Cassandra
cluster is needed.
Users that want to setup an Apache Cassandra cluster are encouraged to
check out the Cassandra distributions available at
Planet Cassandra.
The Cassandra PV Archiver server is compatible with Cassandra 2.2 and
3.x. Most likely, it is also going to be compatible with newer versions
of Cassandra.

In the simplest case, the Cassandra cluster may consist of only a single
node running on the same system as the Cassandra PV Archiver server.
In general, it is a good idea to colocate Cassandra PV Archiver server
nodes and Apache Cassandra nodes on the same set of computers, but
technically speaking, there is no need for such a setup and the two
software components can safely be separated into two sets of computers
if this is preferred for administrative reasons.

Installing the JRE or JDK and the Cassandra cluster is outside the scope
of this document.
Readers are encouraged to refer to the documentation of the JRE / JDK of
their choice for installation instructions.
On most Linux distributions, choosing the JRE / JDK available from the
distributions’s repositories is typically the best choice.
For setup instructions for Apache Cassandra, please refer to the
Cassandra documentation
provided by DataStax.

1.1. Clock synchronization

For operation of both Apache Cassandra and the Cassandra PV Archiver
server, it is critical that the clocks of all servers are well
synchronized.
In an Apache Cassandra database, a large clock skew can lead to data
corruption.
The administrator should take appropriate means for synchronizing the
servers’ clocks and monitoring the clock skew.

The setup of a proper clock synchronization solution is outside the
scope of this document.
As a minimum, it is suggested that the administrator provides at least
two NTP servers with which all servers are synchronized.
These servers should be synchronized with each other and with some
external reference, preferably a set of low-stratum NTP servers or
even a GPS clock.
NTP servers should typically run on physical hosts, not inside virtual
machines.
Many virtual machine solutions do not provide an adequately stable
clock, so that NTP servers might be unreliable when running inside a
virtual machine.

The Cassandra PV Archiver server contains some rudimentary clock skew
monitoring system that tries to detect the clock skew between the
servers.
When this system detects that the clock of a server is skewed by more
than 800 ms, it logs a warning.
When it detects that the clock is skewed by more than 1200 ms, it
immediately kills the server.
The server is also killed when the monitoring process detects that the
server’s clock skipped back in time.

Due to inherent limitiations of the implementation (for example using
a TCP based protocol), this mechanism will typically underestimate the
actual clock skew.
For this reason, it is suggested that additional means are used for
monitoring the clock skew and the mechanism provided by the Cassandra
PV Archiver server is only considered a “last line of defense” in case
all other mechanisms fail.