What’s New in Cassandra 2.2: JSON Support

One of the several improvements in Cassandra 2.2 is the extension of CQL to make it easier to work with JSON documents. The SELECT and INSERT statements now include a JSON-focused variant, and two new native functions have been added to convert to and from JSON.

JSON != Schemaless

When designing this feature, we wanted to ensure that users would continue to work with data in a type-safe, schema-enforced way. For that reason, working through JSON does not allow you to bypass Cassandra's schema. All data is still validated against the same types, and the schema must be manually defined up-front with a normal CREATE TABLE statement.

INSERT JSON

The INSERT statement now accepts a JSON variant. Suppose we have a table defined like this:

CREATE TABLE users (
id text PRIMARY KEY,
age int,
state text
);

Normally we would insert a row like this:

INSERT INTO users (id, age, state) VALUES ('user123', 42, 'TX');

The JSON version looks like this:

INSERT INTO users JSON '{"id": "user123", "age": 42, "state": "TX"}';

The JSON-encoded map is simply a CQL string literal that is a JSON encoding of a map where keys are column names and values are column values. This means that drivers don't need to do anything special to support INSERT JSON. For example, with the python driver, you could prepare and execute the statement like so:

Type Interpretation

When Cassandra types have a sensible native JSON equivalent, such as ints, floats, booleans, and lists, those native types are accepted. For Cassandra types that don't have a clear JSON equivalent, such as UUIDs, a string representation matching the normal CQL literal format should be used.

For example, in CQL you can represent a UUID with a literal like this:

Omitted Columns

Columns which are omitted from the JSON value map are treated as a null insert (which results in an existing value being deleted, if one is present).

Non-text Map Keys

The JSON specification does not allow for non-text map keys. However, Cassandra's map type does support non-text keys. In order to support non-text keys, Cassandra will accept JSON-encoded string representations of any type as a map key [1].

Note that the map key is a JSON encoding of the list, not a normal CQL string literal, so double-quotes are used to surround the text items (and need to be escaped).

Case-sensitive Column Names

The INSERT JSON value map uses column names for the top-level keys. As with normal CQL, these column names are case-insensitive. So, for example, if you have a table like this:

CREATE TABLE users (
id text PRIMARY KEY,
age int,
state text
);

The following INSERT would work just fine:

INSERT INTO users JSON '{"ID": "user123", "Age": 42, "StAtE": "TX"}';

On the other hand, if your table is declared with case-sensitive column names, you will need to use slightly special column names in your JSON value map. Suppose our table is instead defined like this:

CREATE TABLE users (
id text PRIMARY KEY,
"Age" int,
"State" text
);

The "Age" and "Stage" columns are case-sensitive. In the JSON value map, you must match the capitalization and add an extra set of double-quotes to the column names:

SELECT JSON

The SELECT statement has also be extended to support retrieval of rows in a JSON-encoded map format. The results for SELECT JSON will only include a single column named [json]. This column will contain the same JSON-encoded map representation of a row that is used for INSERT JSON. For example, if we have a table like the following:

The results of SELECT JSON are designed to be usable in an INSERT JSON statement without any modifications, so all of the same rules about non-text map keys and case-sensitive column names apply.

fromJson() and toJson()

INSERT JSON and SELECT JSON are designed to work with entire rows. When you only need to use JSON for a single column, the new toJson() and fromJson() functions can be used. These behave the same as INSERT JSON and SELECT JSON, but are limited to a single value or column.

fromJson()

The fromJson() function converts a single JSON-encoded string to a normal Cassandra value. For example, this can be used when performing an update:

UPDATE users SET age = fromJson('42') WHERE id = fromJson('"user123"');

The only place where fromJson() cannot be used is the selection clause of SELECT statements. (This is because Cassandra can't know in advance what type the result will be.)

toJson()

The toJson() function is the inverse of fromJson(). It can be used to convert any column to a JSON representation. For example:

The toJson() function can only be used in the selection clause of SELECT statements.

Summary

Cassandra 2.2 makes it easier to work with JSON documents without sacrificing the benefits of schema enforcement. Try it out and let us know what you think!

Footnotes

[1]: For the sake of consistency, it will accept string representations of types anywhere, not just in map keys. However, for clarity and performance reasons I don't suggest using this unless you need to.

DataStax has many ways for you to advance in your career and knowledge.

You would typically use the fromJson() function. For example: UPDATE mytable SET data = fromJson(?) WHERE id = ?

If you’re overwriting the entire row, you can of course just use the normal INSERT JSON syntax. There’s also a Cassandra ticket to support “unset” values for columns that are omitted from INSERT JSON, which would make it easier to partially update a row in some cases.

When doing upserts from say a C# application, how does the overhead/performance of sending JSON compare to using bind variables against a prepared statement (assuming the JSON insert is also done using a prepared statement)?