Cassandra-stress

Video

Cassandra-stress is great for stress testing your cluster for benchmarking or for load testing. In this unit, we will be showing you use cases for this nifty tool.

Transcript:

Alright we through making sure your cluster is as close the right size as possible. What’s next? Stress testing your cluster benchmarking or for load testing. No need to stress about it...we have a tool--Cassandra-stress! See what I did there?

What is Cassandra-stress? Weren’t you listening? I just said it. It’s a tool used for benchmarking and load testing your cluster and simulate a user defined load. Cassandra stress can be used to do the following things: check out your schema performance, figure out how your database will scale, optimize your data model, and figure out your capacity in a production environment. So let me sum up. Cassandra-stress is a tool that will help you try out your database before you switch over to production.

So what do the yaml file and cassandra-stress have to do with each other? Well! You can configure cassandra-stress in your yaml file. Tricky, huh? You can Define your schema​, Specify a compaction strategy​ and Create a characteristic workload. You probably could have read those three bullets yourself, huh?

Yaml file is broken into a few pieces; Schema Description — which defines the keyspace​. Column Descriptions which outline how to create the simulated data, ​Batch Descriptions which define the data insertion pattern and ​Query Descriptions — defines the possible queries for test performs. We will cover each of these in a little more detail in the upcoming slides. Stay tuned!

This section will define the keyspace and tables. If the schema already exists it just deals with keyspace and tables. If it doesn’t exist, the test will create the schema. Let’s look at a real example.

Ooooooh, a real example! The top section names the keyspace and then uses standard CQL to create the keyspace with replication strategy. The lower part here are the CQL table definitions. Hopefully you’ve already seen some CQL so you don’t need me to read you a slide.

Next the yaml file can help with column definition. This allows us to define how we will generate data for each column. The data generated is contrived but it is created in such a way to simulate the patterns and frequency of your data. These generated values can follow standard distributions like normal or gaussian, and others. Parameters include:​ the Data size which is how many characters are in the data value​. Value population which is how often values re-occur​.

And finally Cluster distribution which is the number of values for the column appearing in a partition.

I won’t insult you by reading these bullets to you, but take a quick look at the possible distributions supported in cassandra-stress. These will allow you to model data that closest matches your “real” environment and datasets.

Now let’s see it in action in the actual fancy yaml file. Here is an example of where you specify your column definitions are apply a different distribution per column should you need or want to.

Another section in the yaml describes batch configuration. This is where you would configure the batch type, the distribution ratio and partition distribution which is the number of partitions to update per batch.

And back to the yaml file! We seem to spend a lot of time here! Trust me, it’s easier than I am making it seem. Here is where you will configure the cassandra stress batch parameters.

Another cool thing you can do is define the queries you want run in your cassandra stress test by defining them under the queries section of the yaml file. The fields parameter defines if the bind variables should be from same row or across all the rows in the partition.

Back to the yaml file. I swear this is the last time! Well, At least for this module. This is where you can specify the query or queries in CQL that will be executed for this test.

Let’s run an actual insert test with cassandra-stress.

On the command line, type the following: cassandra-stress user profile=blogpost.yaml ops\(insert=1\). It will start with 4 threads and increase them until an upper limit is hit. Inserts are done using native transport, for example CQL and prepared statements.

To test the queries, use the yaml file where the queries are defined, in this case blogpost.yaml. Parameters to these commands are passed on the command line.

Oh look! You can combine both the inserts and queries in the same command. In this example we are sending 3 queries for 1 inserts. There are 2 singlepost and one timeline. You can mix and match whatever number of inserts and queries to suit your needs.

Okay! So enough of hearing me talk! Why don’t you get your hands dirty and work on an exercise.