Using an Example of Sharding with Hibernate

Amazon developer manager James Horsley provides an example application that shows how to use Amazon RDS and Hibernate sharding to help scale your database.

Details

Submitted By:

JamesH@AWS

AWS Products Used:

Amazon RDS

Language(s):

Java, SQL, XML

Created On:

March 10, 2011 11:22 PM GMT

Last Updated:

March 10, 2011 11:22 PM GMT

Using an Example of Sharding with Hibernate

Overview

Amazon Relational Database Service (Amazon RDS) can be combined with sharding to help scale your database. In Scaling Databases With RDS we described how this combination works. Now we'll walk you through a practical example. In this article we'll build a code sample that shards news articles with Amazon RDS and Hibernate Shards. The news application needs to store and retrieve news articles across a range of topics. We will shard our articles based on their category. For our application we have seven categories, which we define as business, entertainment, health, science, sports, technology, and world.

Virtual Shards

In this example, we assume our application currently doesn't have enough load to need a physical shard for each category, but we want to plan ahead with growth in mind. To make future growth easier we make use of virtual shards. So our application code will act as if it has seven shards, but Hibernate will map those seven shards onto a smaller number of physical shards. Each physical shard will map to a MySQL database instance. By using this mapping we can distribute the load to best suit our needs. For our application assume that sports and entertainment generate as much load as the other five categories combined. These two categories will map to one physical shard and the other five categories will map to the other physical shard. The two physical shards will be mapped as follows.

Creating Amazon RDS DB Instances

For our sample we need two database instances, one for each of our physical shards. To reduce the schema setup overhead we use the snapshot and restore capabilities of Amazon RDS to do the following:

Create a single "seed" DB Instance.

Set up the schema on that database.

Snapshot the database.

Create more databases from that snapshot using the RDS RestoreDBInstanceFromSnapshot API.

Create the Schema on the Seed DB Instance

Once the seed database is up and running you can connect to it using a standard SQL client and create the schema. In this example we use a simple, single tabled schema. The article_id column is going to be generated by Hibernate's SharedUUIDGenerator, so the column type will end up being fairly large.

Once the snapshot is available it can be used to create any number of DB Instances. We're only creating two DB Instances in this example, but it could easily be many more. Also, for our trivial setup, the only efficiency we gain is that we don't need to create the schema on each. However, in a typical scenario there might be a lot more setup involved in creating the seed (e.g., application configuration data, user accounts, permissions, etc.).

The following command can be used to create new DB Instances from the seed snapshot:

Hibernate Configuration and Setup

The Hibernate configuration for sharding is very similar to the non-sharded configuration. See the Hibernate Reference Docs for details.

shard.*.hibernate.cfg.xml

There needs to be one of these files for each physical database to which the application will connect. Each hibernate config file refers to a physical shard ID, typically starting at zero and incrementing for each new shard. See the Virtual Shards section of the Amazon RDS/EDS Tech Tips for how we're mapping virtual and physical shard IDs.

Model/Entity

Category.java

In our example the application is sharding based on the category of an article. The list of categories is fairly stable so we can use an enum to enumerate the categories. The mapping of categories and virtual shard IDs is maintained in the SessionFactoryHelper.