Writing Cassandra Applications using the Hector Object Mapper

Apache Cassandra is an open source NoSQL database that is designed to handle massive scalability challenges. In order to achieve its designated goal, Cassandra uses a non-orthodox data model that is quite different from the commonly used relational model. Essentially, Cassandra is just one big table and each row can have multiple columns, which are themselves grouped into column families. This organization enables data to be easily spread across multiple nodes of a Cassandra cluster (thereby enabling scalability), but presents a challenge to developers designing applications for Cassandra.

To solve this problem many “high-level” wrappers around the Cassandra data model have been written. One such wrapper, available for the Java programming language, is Hector. Hector is one of the better choices since it:

Exposes a simple, object-oriented interface for Cassandra

Supports failover behavior on the client side

Provides connection pooling for performance improvement

Has JMX support

In this post, we will discuss how to use the Hector Object Mapper to map Plain Old Java Objects (POJOs) to the Cassandra data model. Using POJOs allows circumventing the use of complex object hierarchies for fulfilling even simple data modelling needs. Plus, they enable data to be modeled and manipulated in terms of simple, self-contained objects. To read about another example that uses the Hector Object Mapper in Cassandra, read Chapter 10 in Cassandra High Performance Cookbook.

The first task that needs to be done is to create the definition of a simple POJO object. The following code demonstrates how this can be done.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

packagecom.mycompany;

// imports omitted

@Entity

@Table(name="TestColumnFamily")

publicclassMyPojo{

@Id

privateUUID id;

@Column(name="lp1")

privatelonglongProp1;

@Column(name="sp2")

privatestringstrProp2;

publicUUID getId(){

returnid;

}

publicvoidsetId(UUID id){

this.id=id;

}

publiclonggetLongProp1(){

returnlongProp1;

}

publicvoidsetLongProp1(longlongProp1){

this.longProp1=longProp1;

}

publicstringgetStrProp2(){

returnstrProp2;

}

publicvoidsetStrProp2(str strProp2){

this.strProp2=strProp2;

}

}

The @Entity annotation allows the EntityManager (discussed below) to identify the class as an entity. The name provided with the @Table annotation is used to specify the column family name of the entity, which if left as-is, it defaults to the name of the class. The @Column annotations are used to specify the “fields” or columns of the entity. Again, the name is used to specify the column name.

Next, we need to initialize the EntityManager and use it to create and persist objects. This can be done as follows:

First, we initialize a cluster and keyspace object in which we want to store data. Next, we create an EntityManager object and pass it the “com.mycompany” in the constructor to instruct it to scan for entities under that namespace. We create a MyPojo object, store data in its columns and persist it by calling the save method on EntityManager. We then create another object by reading the newly stored object from the database by using its unique ID.

There’s much more to persisting objects in Cassandra than covered here. There’s a complete Clients chapter that can be found in Cassandra: The Definitive Guide, available in Safari Books Online.

Safari Books Online has the content you need

Check out these Cassandra books available from Safari Books Online:

he rising popularity of Apache Cassandra rests on its ability to handle very large data sets that include hundreds of terabytes — and that’s why this distributed database has been chosen by organizations such as Facebook, Twitter, Digg, and Rackspace. With Cassandra: The Definitive Guide, you’ll get all the details and practical examples you need to understand Cassandra’s non-relational database design and put it to work in a production environment.

Apache Cassandra is a fault-tolerant, distributed data store which offers linear scalability allowing it to be a storage platform for large high volume websites. Cassandra High Performance Cookbook provides detailed recipes that describe how to use the features of Cassandra and improve its performance. Recipes cover topics ranging from setting up Cassandra for the first time to complex multiple data center installations. The recipe format presents the information in a concise actionable form.

About the author

Shaneeb Kamran is a hardcore programmer at heart and an aspiring entrepreneur by day. His programming journey started at the age of 12 and ever since he has dabbled himself in every new and shiny technology he could get his hands on. He is currently involved in a startup that is working on cloud computing products. He can be reached at me@shaneeb.com