Deciding to use a managed NoSQL datastore is a great step in ensuring you run a fast, scalable and resilient application without needing to be an expert in highly available architecture. How do you know which technology is the best for your application? How do you know whether the provider's performance claims are true? You are putting your application on someone else’s infrastructure and that requires some hard answers about their claims.

To determine the suitability of a provider, your first port of call is to benchmark. Choosing a service provider is often done in a number of stages. First is to shortlist providers based on capabilities and claimed performance, ruling out those that do not meet your application requirements. Second is to look for benchmarks conducted by third parties, if any. The final stage is to benchmark the service yourself.

In this article we will show you how to run some preliminary benchmarks against two managed NoSQL systems. For this test we will compare Instaclustr and Amazon DynamoDB using the Yahoo Cloud Serving Benchmark (YCSB). Instaclustr provides managed Apache Cassandra hosting and DynamoDB is Amazons own managed key value store solution.

Both Cassandra and DynamoDB are very similar architecturally and both Instaclustr and DynamoDB services run on Amazon's cloud infrastructure, so it will be an excellent performance test and comparison of the two.

Step 1 - Client Setup

YCSB is a cloud service testing client that performs reads, writes and updates according to specified workloads. Running from the command line it can create an arbitrary number of threads that will query the system under test. It will measure throughput in operations per second and record the latency in performing these operations.

YCSB can run in parallel from multiple hosts. For this article we will deploy YCSB across 4 High-CPU Extra Large EC2 instances (c1.xlarge), however you can just as easily run this from a single instance. Investigate EC2 spot pricing to reduce the cost of benchmarking.

Launch each instance with a Ubuntu based AMI. Make a note of the security group you assign to the instances and the region you deployed them in. Once the instances have booted, SSH into each instance (replacing ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com with the DNS name of your instance):

Step 2 - Service Setup

Select the region you wish to deploy to, ensuring that the region you choose is the same region containing your test client instances (created in step 1). For this test we will choose Instaclustr's professional tier as it roughly corresponds to the same price as DynamoDB's highest capacity table you can request without needing to ask Amazon to increase your account limits.

Instaclustr will ask you to select a SSH key to associate with the cluster (you can also generate a new key from the cluster creation page). Accept the terms and conditions and hit “Create Cluster”. This will cost around $8.88 an hour plus transfer and S3 costs.

While Instaclustr deploys your Cassandra cluster we will create the DynamoDB table.

Set the Name for your DynamoDB table to be usertable and set the Primary Key Type to Hash. Set the Hash Attribute Name to firstname. Click next and set the Read and Write capacities to 10000. Click next until done, skipping the cloud watch configuration page, to create the table. This will cost around $7.80 plus storage and transfer costs per hour.

While you are in the AWS console, go to the EC2 console page. Click on security groups and allow All TCP, All UDP and All IMCP packets between the test client security group you selected in step 1 and your Instaclustr security group (called instaclustr-yourname-yourclustername-group). You can do this by entering the security group id into the source box when setting security group rules.

Your DynamoDB table will take a little while to create. While waiting let's go back to the test clients and configure the tests.

Step 3 - Test configuration

Edit the file ~/YCSB/dynamodb/conf/AWSCredentials.properties and set the AccessKey and SecretKey properties to those of your account (remember to uncomment the lines by deleting the #).

Edit ~/YCSB/dynamodb/conf/dynamodb.properties and set the path to the AWSCredentials.properties file. If you created your test clients in a different region to us-east1 change the DynamoDB endpoint to match and leave the rest of the settings as is.

Your dynamodb properties file should contain something similar to the following:

Go back to your Instaclustr dashboard and open OpsCenter for your cluster. Click Data Modeling (in the left hand menu) and click Add Keyspace. Set the Name to usertable and replicationfactor_ to 3. Untick I would like to create a Column Family and click Save Keyspace. Go back to your Instaclustr dashboard and make note of the public DNS names for the nodes in your Cassandra cluster.

On one of your test client instances run the following command, replacing ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com with the DNS name for one of your Cassandra nodes:

On each of your test client instances, create a file in the ~/YCSB directory called Cassandra.props containing the following. For each separate test client set the insertstart property to be 0, 2500000, 5000000 and 7500000 respectively. This means they will write different portions of data to the cluster. Set the hosts property to the DNS names of the Cassandra nodes (replacing ec2-XX-XX-XX-XX.compute-1.amazonaws.com etc with the DNS names of your Cassandra nodes).

recordcount=10000000
insertstart=0 # This should be different for each client instance
insertcount=2500000
hosts=ec2-XX-XX-XX-XX.compute-1.amazonaws.com,ec2-YY-YY-YY- YY.compute-1.amazonaws.com, ec2-AA-AA-AA-AA.compute-1.amazonaws.com …

Create a file in the ~/YCSB directory called dynamo.props and fill it with the same information as the Cassandra.props file leaving the hosts setting out. Again, for each separate test client set the insertstart property to be 0, 2500000, 5000000 and 7500000 respectively. So it should look something like this:

recordcount=10000000
insertstart=0 # This should be different for each client instance
insertcount=2500000

If you are using a single test client you can leave insertstart and insertcount out of both files.

Once the workload A data is loaded into both Instaclustr and DynamoDB you can execute different YCSB workloads. Different workloads will have different proportions of read, write and update operations that they test. Each workload is split into a load and run portion. The load component will pre-fill the database with the required data and the run component will perform the workload to be measured. For example the run portion of workload A (which we used to load some data) consists of a 50 / 50 split of read and write operations. The load operation for workload A is similar to that of the other tests and hence the data generated from it can be reused.

Other workloads have different usage patterns and may be more appropriate for testing a particular technology for your specific use case. You can also use the load stage of Workload A as a measure of system write performance.

At the end of each test, YCSB will output a summary of the test. This will be found in the piped output results files created in the YCSB directory (workloada-cassandra.results, etc).

Step 5 - The results

Evaluate the results against your application throughput and latency requirements. Remember to sum the average throughput of each YCSB client to get the total average throughput. YCSB also reports the 95th and 99th percentile latencies for each test.

Reader Comments (3)

This post should sell T-shirts:

I came here to gain irrelevant link-ammo for uninformed internet debates on tool choices, and all I got was this lousy reasoned advice on how to evaluate services to find the right tool for my specific workload.