About Brisk Packaged Installations

The packaged releases create a user cassandra. When starting brisk as a service, the Cassandra and Hadoop tracker services run as this user. A service initialization script is located in /etc/init.d/brisk. Run levels are not set by the package.

The package installs into the following directories:

Brisk / Cassandra Directories

/var/lib/cassandra (Cassandra and CassandraFS data directories)

/var/log/cassandra

/var/run/cassandra

/usr/share/brisk/cassandra (Cassandra environment settings)

/usr/share/brisk/cassandra/lib

/usr/share/brisk-demos (Portfolio Manager demo application)

/usr/bin

/usr/sbin

/etc/brisk/cassandra (Cassandra configuration files)

/etc/init.d

/etc/security/limits.d

/etc/default/

Hadoop Directories

/usr/share/brisk/hadoop (Hadoop environment settings)

/etc/brisk/hadoop (Hadoop configuration files)

Hive Directories

/usr/share/brisk/hive (Hive environment settings)

/etc/brisk/hive (Hive configuration files)

Hive Directories

/usr/share/brisk/pig (Pig environment settings)

/etc/brisk/pig (Pig configuration files)

Next Steps

Configuring and Initializing a Brisk Cluster

Before you can start Brisk, be it on a single or multi-node cluster, there are a few Cassandra configuration properties you must set on each node in the cluster. These are set in the cassandra.yaml file (located in/etc/brisk/cassandra in packaged installations or $BRISK_HOME/resources/cassandra/conf in binary distributions).

Initializing a Single-Node Brisk Cluster (for evaluation purposes)

Brisk is intended to be run on multiple nodes, however you may want to start with a single node Brisk cluster for evaluation purposes. To start Brisk on a single node:

Set the following properties in the cassandra.yaml file:

Go

1

2

cluster_name:'BriskTest'

initial_token:0

Start Brisk.

Go

1

brisk cassandra-t

The -t option starts Cassandra (with CassandraFS) and the Hadoop Job Tracker and Task Tracker services. Because there is no Hadoop NameNode with CassandraFS, there is no additional configuration to run MapReduce jobs in single mode versus distributed mode.

When running on a single node, there are no additional steps to configure the Cassandra seed node and Brisk job tracker node, as they are automatically set to localhost.

Initializing a Multi-Node Brisk Cluster

Before you start a multi-node Brisk cluster you must determine the following:

A name for your cluster

How many total nodes your Brisk cluster will have

The IP addresses of each node

The token for each node (see Generating Tokens). If you are deploying a mixed-workload Brisk Cluster, make sure to alternate token assignments between Cassandra nodes and Brisk nodes so that replicas are evenly distributed around the Cassandra ring.

Which nodes will serve as the seed nodes. If you are configuring a mixed-workload cluster, you should have at least one seed node for each side (the Cassandra real-time side and the Brisk analytics side).

If you intend to run a mixed-workload cluster determine which nodes will serve which purpose.

For example, suppose you are starting a 6 node mixed-workload cluster with 3 Brisk nodes and 3 Cassandra nodes. The nodes have the following IPs:

node0 (Cassandra seed) 110.82.155.0

node1 (Cassandra) 110.82.155.1

node2 (Cassandra) 110.82.155.2

node3 (Brisk seed) 110.82.155.3

node4 (Brisk) 110.82.155.4

node5 (Brisk) 110.82.155.5

The cassandra.yaml file for each node would have the following modified property settings. Note that in a mixed-workload cluster, the token placement alternates between Cassandra and Brisk nodes. This ensures even distribution of replicas on both sides of the cluster. For example:

node 0: 0

node 3: 28356863910078205288614550619314017621

node 1: 56713727820156410577229101238628035242

node 4: 85070591730234615865843651857942052864

node 2: 113427455640312821154458202477256070485

node 5: 141784319550391026443072753096570088106

Node0

Go

1

2

3

4

5

6

cluster_name:'BriskTest'

initial_token:0

seed_provider:

-seeds:"110.82.155.0,110.82.155.3"

listen_address:110.82.155.0

rpc_address:0.0.0.0

Node1

Go

1

2

3

4

5

6

cluster_name:'BriskTest'

initial_token:56713727820156410577229101238628035242

seed_provider:

-seeds:"110.82.155.0,110.82.155.3"

listen_address:110.82.155.1

rpc_address:0.0.0.0

Node2

Go

1

2

3

4

5

6

cluster_name:'BriskTest'

initial_token:113427455640312821154458202477256070485

seed_provider:

-seeds:"110.82.155.0,110.82.155.3"

listen_address:110.82.155.2

rpc_address:0.0.0.0

Node3

Go

1

2

3

4

5

6

cluster_name:'BriskTest'

initial_token:28356863910078205288614550619314017621

seed_provider:

-seeds:"110.82.155.0,110.82.155.3"

listen_address:110.82.155.3

rpc_address:0.0.0.0

Node4

Go

1

2

3

4

5

6

cluster_name:'BriskTest'

initial_token:85070591730234615865843651857942052864

seed_provider:

-seeds:"110.82.155.0,110.82.155.3"

listen_address:110.82.155.4

rpc_address:0.0.0.0

Node5

Go

1

2

3

4

5

6

cluster_name:'BriskTest'

initial_token:141784319550391026443072753096570088106

seed_provider:

-seeds:"110.82.155.0,110.82.155.3"

listen_address:110.82.155.5

rpc_address:0.0.0.0

Generating Tokens

Tokens are used to assign a range of data to a particular node. Assuming you are using the RandomPartitioner, this approach will ensure even data distribution.

Create a new file for your token generator program:

Go

1

vi tokengentool

Paste the following Python program into this file:

Go

1

2

3

4

5

6

7

8

#! /usr/bin/python

importsys

if(len(sys.argv)&gt;1):

num=int(sys.argv[1])

else:

num=int(raw_input("How many nodes are in your cluster? "))

foriin range(0,num):

print'node %d: %d'%(i,(i*(2**127)/num))

Save and close the file and make it executable:

Go

1

chmod+xtokengentool

Run the script:

Go

1

./tokengentool

When prompted, enter the total number of nodes in your cluster:

Go

1

2

3

4

5

6

7

How many nodes are in your cluster?6

node0:0

node1:28356863910078205288614550619314017621

node2:56713727820156410577229101238628035242

node3:85070591730234615865843651857942052864

node4:113427455640312821154458202477256070485

node5:141784319550391026443072753096570088106

On each node, edit the cassandra.yaml file and enter its corresponding token value in theinitial_token property.

Starting a Brisk Cluster

After you have installed and configured Brisk on one or more nodes, you are ready to start your Brisk cluster. If you want to run a multi-node Brisk cluster, you must first install the Brisk packages on each node, and then configure each node according to the instructions in Initializing a Brisk Cluster.

Packaged installations include startup scripts for running Brisk as a service. Binary packages do not.

Starting Brisk as a Stand-Alone Process

If running a mixed workload cluster, determine which nodes to start as Cassandra nodes and which nodes to start as Brisk nodes. To start Brisk as a service see Starting Brisk as a Service. Otherwise, you can start the Brisk server process as follows:

On a Brisk node:

Go

1

brisk cassandra-t

On a Cassandra node:

Go

1

brisk cassandra

Starting Brisk as a Service

Packaged installations provide startup scripts in /etc/init.d for starting Brisk as a service. Before starting Brisk as a service on a node, you must first configure the Cassandra service to start the Hadoop Job Tracker and Task Tracker services as well.

Note

For mixed-workload clusters, nodes that are Cassandra-only can simply start the Cassandra service (skip step 1).

Create the file /etc/default/brisk, and add the following line as the contents of this file:

Go

1

HADOOP_ENABLED=1

Start the Brisk service:

Go

1

sudo service brisk start

Note

On Enterprise Linux systems, the Brisk service runs as a java process. On Debian systems, the Brisk service runs as a jsvc process.

–binary install–

Installing the Brisk Binary Distribution

To run Brisk, you will need to install a Java Virtual Machine (JVM). DataStax recommends installing the most recently released version of the Sun JVM. Versions earlier than 1.6.0_19 are specifically not recommended.

Download the distribution to a location on your machine and unpack it:

Go

1

tar-xvf brisk-1.0-beta1-bin.tar

For convenience, you may want to set the following environment variables:

Go

1

2

export BRISK_HOME=&lt;install_location&gt;/brisk-&lt;version&gt;

export PATH=$PATH:$BRISK_HOME/bin

Create the data and logging directories needed by Brisk Cassandra. By default, Cassandra uses/var/lib/cassandra and /var/log/cassandra. To create these directories, run the following commands where $USER is the user that will run Brisk:

Go

1

2

3

4

sudo mkdir/var/lib/cassandra

sudo mkdir/var/log/cassandra

sudo chown-R$USER:$GROUP/var/lib/cassandra

sudo chown-R$USER:$GROUP/var/log/cassandra

About Brisk Binary Installations

Brisk Directories

bin (Brisk start scripts)

demos (Portfolio Manager Demo)

interface

javadoc

lib

resources/cassandra/bin (Cassandra utilities)

resources/cassandra/conf (Cassandra configuration files)

resources/hadoop (Hadoop installation)

resources/hive (Hive installation)

resources/pig (Pig installation)

Installing JNA

Installing JNA (Java Native Access) on Linux platforms can improve Brisk memory usage. With JNA installed and configured as described in this section, Linux does not swap out the JVM, and thus avoids related performance issues.

OpsCenter packages are available from DataStax. You will need a username and password to access the OpsCenter package repositories. If you registered online, these credentials should have been sent to you in an email. If you do not have your OpsCenter credentials, contact DataStax Support.

These instructions assume that you have the aptitude package management application installed, and that you have root access on the machine where you are installing. If you have not already, log in as root. Optionally, you can run the commands using sudo.

In this file, add a line for the DataStax OpsCenter repository, where <username> and <password>are the username and password from your OpsCenter registration email. Note the different repository locations for the free and paid versions of OpsCenter.For the free version of OpsCenter:

Go

1

deb http://username:password@deb.opsc.datastax.com/free unstable main

For the paid version of OpsCenter:

Go

1

deb http://username:password@deb.opsc.datastax.com unstable main

In this file, also add a line for the general DataStax repository (for installing dependent packages such as jna). Add the appropriate repository location for your operating system where OSType is lenny,lucid, maverick, squeeze or natty:

Go

1

deb http://debian.datastax.com/OSType OSType main

For example, if installing on Ubuntu 10.10 (Maverick):

Go

1

deb http://debian.datastax.com/maverick maverick main

If installing on Debian 5.0 (Lenny), add the lenny-backports repository definition as well.

Go

1

2

deb http://debian.datastax.com/lenny lenny main

deb http://backports.debian.org/debian-backports lenny-backports main

Save and close the /etc/apt/sources.list file after you are done adding the appropriate DataStax repositories.

(Debian 5.0 Only) If installing on Debian 5.0 (Lenny), run the following commands as well.

Go

1

2

# aptitude install debian-archive-keyring

# aptitude install python-support=1.0.3~bpo50+1

Install the OpsCenter package using aptitude. Note the different package names for the free and paid versions of OpsCenter.For the free version of OpsCenter:

Go

1

2

# aptitude update

# aptitude install opscenter-free

For the paid version of OpsCenter:

Go

1

2

# aptitude update

# aptitude install opscenter

Installing OpsCenter on RHEL and CentOS

DataStax provides yum repositories for RedHat Enterprise Linux (RHEL) and CentOS versions 5.4, 5.5 and 5.6. There are different package repositories for the free and paid versions of OpsCenter.

These instructions assume that you have the yum package management application installed, and that you have root access on the machine where you are installing OpsCenter console. If you have not already, log in as root. Optionally, you can run the commands using sudo.

EPEL (Extra Packages for Enterprise Linux) contains dependent packages required by OpsCenter, such as jna and jpackage-utils. EPEL must be installed on the OpsCenter machine. To install the epel-release package:

Add a yum repository specification for the DataStax OpsCenter repository in /etc/yum.repos.d. For example:

Go

1

# vi /etc/yum.repos.d/opscenter.repo

In this file add the following lines where <username> and <password> are the username and password from your OpsCenter registration email. Note the different repository locations for the free and paid versions of OpsCenter.

Install the OpsCenter package using yum. Note the different package names for the free and paid versions of OpsCenter.For the free version of OpsCenter:

Go

1

# yum install opscenter-free

For the paid version of OpsCenter:

Go

1

# yum install opscenter

About Your OpsCenter Installation

The OpsCenter packaged releases create an opscenter user. When starting the OpsCenter dashboard as a service, the service runs as this user. A service initialization script is located in /etc/init.d. Run levels are not set by the package.

Before starting OpsCenter and installing agents, make the required settings described in Configuring OpsCenter.