Chapter 1. Admin Guide

1. Overview

Db_cassandra is one of the Kamailio database modules. It does
not export any functions executable from the configuration scripts,
but it exports a subset of functions using the database API, and thus,
other modules can use it as a database driver, instead of, for
example, the Mysql module.

The storage backend is a Cassandra cluster and
this module provides an SQL interface to be used by other modules for
storing and retrieving data. Because Cassandra is a NoSQL distributed
system, there are limitations on the operations that can be performed.
The limitations concern the indexes on which queries are performed, as
it is only possible to have simple conditions (equality comparison only)
and only two indexing levels. These issues will be explained in an example
below.

Cassandra DB is especially suited for storing large amounts of data or data
that requires distribution, redundancy or replication. One usage example is
a distributed location system in a platform that has a cluster of SIP Router
servers, with several proxies and registration servers accessing the same location
database. This was actually the main use case we had in mind when implementing
this module. Please NOTE that it has only been tested with the
usrloc, auth_db and domain modules.

You can find a configuration file example for this usage in the module - kamailio_cassa.cfg.

Because the module has to do the translation from SQL to Cassandra NoSQL
queries, the schemas for the tables must be known by the module.
You will find the schemas for location, subscriber and version tables in
utils/kamctl/dbcassandra directory. You have to provide the path to the
directory containing the table definitions by setting the module parameter
schema_path.

There is no need to configure a table metadata in Cassandra cluster.
You only need to define a keyspace with the name of the database and for each table
a column family inside that keyspace with the name of the table. The comparator
and validators should be either UTF8Type or ASCIIType.
Example:

Special attention was given to performance in Cassandra. Therefore, the
implementation uses only the native row indexing in Cassandra and no secondary
indexes, because they are costly. Instead, we simulate a secondary index by
using the column names and putting information in them, which is very efficient.
Also, for deleting expired records, we let Cassandra take care of this with
its own mechanism (by setting the TTL for columns).

The module supports raw queries. However these queries must follow the
CQL (Cassandra Query Language) syntax. The queries can be issued
in the script by means of the AVPOPS module. Keep in mind that when passing back
the results from the database only the first row is used to set the AVP variables.
(default AVPOPS behaviour)
The script lines below can be used as an example for issuing the query towards
an cassandra instance.(This example will work once the column family `location`
is configured correctly in the cassandra keyspace)

2. Dependencies

2.1. SIP Router Modules

2.2. External Libraries or Applications

The following libraries or applications must be installed before running
SIP Router with this module loaded:

Thrift library
(tested with version 0.6.1 and version 0.7.0).
You can download it from http://archive.apache.org/dist/thrift .

The implementation was tested with Cassandra version 1.0.1
and version 1.1.6. I used the sourced from DataStax Community
Edition (http://www.datastax.com/download/community).

3. Parameters

3.1. schema_path (string)

The directory where the files with the table schemas are located.
This directory has to contain the subdirectories corresponding to the
database name (name of the directory = name of the database).
These directories, in turn, contain the files with the table schemas.
See the schemas in utils/kamctl/dbcassandra directory.

6. Table schema

The module must know the table schema for the tables that will be used.
You must configure the path to the schema directory by setting the
schema_path parameter.

A table schema document has the following structure:

First row: the name and type of the columns in the form name(type)
separated by spaces. The possible types are: string, int, double and timestamp.

Thetimestamp type has a special meaning. Only one column of this type can
be defined for a table, and it should contain the expiry time for that record.
If defined this value will be used to compute the ttl for the columns
and Cassandra will automatically delete the columns when they expire. Because we want the
ttl to have meaning for the entire record, we must ensure that when the ttl is updated, it
is updated for all columns for that record. In other words, to update the expiration time
of a record, an insert operation must be performed from the point of view of the db_cassandra
module ("insert" in Cassandra means "replace if exists or insert new record otherwise"). So, if
you define a table with a timestamp column, the update operations on that table that also
update the timestamp must update all columns. So, these update operations must in fact be insert
operations.

Second row: the columns that form the row key separated by space.

Third row: the columns that form the secondary key separated by space.

Below you can see the schema for the location
table (when use_domain not set):

Observe first that the row key is the username and
the secondary index is the contact.
We have also defined a timestamp column - expires.

If you need to use the domain part of the AOR also (you have set use_domain
parameter for usrloc in the script), you should include the domain column in
the list of columns and in the primary key. The schema will then look like this:

Notice that a key (primary or secondary) can be composed from more columns,
in which case you have to specify them separated by space.

To understand why the schema looks like this, we must first see which
queries are performed on the location table.
(The 'callid' condition was ignored as it doesn't really have a well defined role in the SIP RFC).

When Invite received, lookup location: select where username='..'.

When Register received, update registration:
update where username='..' and contact='..'.

So, the relation between these keys is the following:

The unique key for a table is actually the combination of row key + secondary key.

A row defined by a row key will contain more records with different secondary keys.

The timestamp column that leaves the Cassandra cluster to deal with deleting the expired
records. For this to work right we needed to modify a bit the behavior of usrloc module and
replace update sql query performed at re-registration with an insert sql query (so that all
columns are updated and the new timestamp is set for all columns).
This behavior is enabled by setting a parameter in the usrloc module
db_update_as_insert:

...
modparam("usrloc", "db_update_as_insert", 1)
...

Also you should disable in usrloc module the timer routine that checks for expired records.
You can do this by setting the timer interval to 0.
timer_interval:

...
modparam("usrloc", "timer_interval", 0)
...

The alternative would have been to define an index on the expire column and
run a external job to periodically delete the expired records. However,
obviously, this would be more costly.

7. Limitations

The module can be used only when the queries use only one index, which is also the
unique key, or have two indexes that form the unique key like in the usrloc usage.