Geekout

Description
Eindex is a module for sharding column based indexes. With a similar schemeof
how Cassandra shards row keys we decided to shard column keys. With this
scheme you can still use the Cassandra Random Partitioner and get range queries
for keys. Our goal is to support 100’s of millions of keys across a Cassandra
cluster.

EID is a service for generating unique ID numbers at high scale with some
simple guarantees (based on the work from https://github.com/twitter/snowflake).
The service can be in-memory or run as a REST-ful web service using Jetty.

This will be a short post describing how we configure a raid level 0 drive on
an EC2 instance using the EBS drives. For a lot of our functionality we
typically use the ephemeral drives and periodically backup content using the
EBS drives and snapshots. We mainly use raided EBS drives to get the maximum
performance out of an Amazon EC2 small instances. For example we have seen
nearly double the performance out of our Cassandra cluster on small instances
using raided EBS drives.

In this post we will walk through setting up a production ready 3 node
Cassandra cluster with Munin monitoring running on Amazon EC2 in under 30
minutes. We will also walk through getting the sample Cassandra stress scripts
running with a basic load on the 3 node cluster. This post builds on a
previous post about how to setup and maintain an EC2 virtual instance with
our supplied unattended install scripts. If you wish to know more about how
our unattended install scripts works please review my previous post.

After maintaining several version of my own private AMI’s and, realizing what
a pain maintenance was, I decided to find a better solution. There is a lot of
great information on the net if your google-fu is good, but I decided to
compile all the information I use into a couple scripts and describe each
step in detail so others could understand, modify and use the scripts. The
overriding goal is to allow the flexibility of launching and configuring
remote Amazon EC2 instances in an non-interactive manner.

Lets look at creating and using a simple thread-safe Java in-memory cache. It
would be nice to have a cache that can expire items from the cache based on a
time to live as well as keep the most recently used items. Luckily the apache
common collections has a LRUMap, which, removes the least used entries from a
fixed sized map. Great, one piece of the puzzle is complete. For the expiration
of items we can timestamp the last access and in a separate thread remove the
items when the time to live limit is reached. This is nice for reducing memory
pressure for applications that have long idle time in between accessing the
cached objects. There is also some debate weather the cache items should return
a cloned object or the original. I prefer to keep it simple and fast by
returning the original object. So the onus is on the user of the cache to
understand modifying the underlying object will modify the object in the cache
as well. Notice this is also an in-memory cache so objects are not serialized
to disk.

There doesn’t seem to be a whole lot of open source options for
Java performance counters. Since I found it frustrating and
rolled my own I decided to share my work so others could just
ditto it. The overarching principal is Simplicity or more
importantly KISS. I wanted something fast, simple, easy to use,
fast, simple and thread-safe (did I mention fast and simple).
After working with windows C++ performance counters (yuck!) talk
about warts and .NET performance counters (nice band-aid, but
still didn’t cover the warts) I opted for a simple
under-engineered design. Before we dive into some samples
lets briefly explain the included Java performance counters.

When building and configuring Amazon EC2 instances I find myself
needing to install the Sun Java 6 runtime and/or the JDK unattended.
This is sometimes referred to as non-interactive or headless install.
The script below is what I typically use to install Java on my
Ubuntu 9.10 instances running on Amazon EC2.

Sometimes it is nice to programmatically run .sql scripts on a MySQL database using Java. This is easily accomplished using the allowMultiQueries configuration property for the MySQL Connector/J driver. When set to true it allows the use of ‘;’ to delimit multiple queries.

I often find myself writing scripts for Amazon EC2 that need to wait for the instance to become available. Instance availability, for me, is dictated when the ssh service becomes available. Lets create a simple script that will poll a ssh connection and wait until it can connect before letting the script continue.