SPARKSEE High Scalability

by Sparsity Technologies

Configuration

Installation

A complete installation includes all the elements previously described in the architecture: SPARKSEE (SPARKSEEHA configuration), the coordination service (ZooKeeper) and the load balancer. The last one byond the scope of this document because, as has been previously stated, it is developers' decision which is the best to use for their specific system.

SPARKSEEHA is included in all distributed SPARKSEE packages. Thus, it is not necessary to install any extra package to make the application HA-enabled it is only a matter of configuration. SPARKSEE can be downloaded as usual from Sparsity's website. Use SPARKSEE to develop your application. Plus, visit SPARKSEE documentation site to learn how to use SPARKSEE.

SPARKSEEHA requires Apache ZooKeeper as the coordination service. Latest version of ZooKeeper v3.4.3 should be downloaded from their website. Once downloaded, it must be installed on all the nodes of the cluster where the coordination service will run. Please note that Apache ZooKeeper requires Java to work, we recommend consulting the Apache ZooKeeper documentation for requirements details.

ZooKeeper

The configuration of Apache ZooKeeper can be a complex task, so we refer the user to the Apache ZooKeeper documentation for more detailed instructions.

This section does, however, cover the configuration of the basic parameters to be used with SPARKSEEHA, to serve as an introduction for the configuration of the ZooKeeper.

Basic ZooKeeper configuration can be performed in the $ZOOKEEPER_HOME/conf/zoo.cfg file. This configuration file must be installed on each of the nodes which is part of the coordination cluster.

clientPort

This is the port that listens for client connections, to which the clients attempt to connect.

dataDir

This shows the location where ZooKeeper will store the in-memory database snapshots and, unless otherwise specified, the transaction log of updates to the database. Please be aware that the device where the log is located strongly affects the performance. A dedicated transaction log device is a key to a consistently good performance.

tickTime

The length of a single tick, which is the basic time unit used by ZooKeeper, as measured in milliseconds. It is used to regulate heartbeats, and timeouts. For example, the minimum session timeout will be two ticks.

server.x=[hostname]:nnnnn[:nnnnn]

There must be one parameter of this type for each server in the ZooKeeper ensemble. When a server goes up, it determines which server number it is by looking for the myid file in the data directory. This file contains the server number in ASCII, and should match the x in server.x of this setting. Please take into account the fact that the list of ZooKeeper servers used by the clients must exactly match the list in each one of the Zookeper servers.

For each server there are two port numbers nnnnn. The first port is mandatory because it is used for the Zookeeper servers, assigned as followers, to connect to the leader. However, the second one is only used when the leader election algorithm requires it. To test multiple servers on a single machine, different ports should be used for each server.

This is an example of a valid $ZOOKEEPER_HOME/conf/zoo.cfg configuration file:

SPARKSEEHA

As previously explained, enabling HA in a SPARKSEE-based application does not require any update of the user's application nor the use of any extra packages. Instead, just a few variables must be defined in the SPARKSEE configuration.

sparksee.ha

Enables or disables HA mode.

Default value: false

sparksee.ha.ip

IP address and port for the instance. This must be given as follows: ip:port. It follows the same policy as Zookeeper, the user must configure it in order to be able to use a private or public IP.

Default value: localhost:7777

sparksee.ha.coordinators

Comma-separated list of the ZooKeeper instances. For each configuration file in an instance, the IP address and the port must be given as follows: ip:port. Moreover, the port must correspond to that given as clientPort in the ZooKeeper configuration file. There is no need to have a Zookeeper instance per DEX instance, in fact for a small architecture very few Zookeper servers are enough.

Default value: ""

sparksee.ha.sync

Synchronization polling time. If 0, polling is disabled and synchronization is only performed when the slave receives a write request, otherwise the parameter fixes the frequency the slaves poll the master asking for writes. The polling timer is reset if the slave receives a write request, at that moment it is (once again) synchronized.

The time is given in time-units: <X>[D|H|M|S|s|m|u] where <X> is a number followed by an optional character representing the unit: D for days, H for hours, M for minutes, S or s for seconds, m for milliseconds and u for microseconds. If no unit character is given, seconds are assumed.

Default value: 0

sparksee.ha.master.history

The history log is limited to a certain period of time, so writes occurring after that period of time will be removed and the master will not accept requests from those deleted SPARKSEE slaves.

For example, in case of 12H, the master will store in the history log all write operations performed during the previous 12 hours. It will reject requests from a slave which has not been updated in the last 12 hours.

This time is given in time-units, as with the previous variable.

Default value: 1D

Please, take into account the fact that slaves should synchronize before the master's history log expires. This will happen if the write ratio of the user's application is high enough, otherwise you should set a polling value, which must be shorter than the master's history log time.

These variables must be defined in the SPARKSEE configuration file (sparksee.cfg) or set using the SparkseeConfig class. More details on how to configure SPARKSEE can be found on the documentation site.

DEXHA activation can be checked by confirming that the Zookeeper has a node with a String containing the master's IP.