incubator-cassandra-user mailing list archives

Creating more ColumnFamilies in more Keyspaces creates more memory overhead. I do not believe
sharding your data is the way to go with cassandra.
You mentioned that you read 200 to 300 keys per request, and it sounded like all this data
was for a single user. If you can group all the user data into a single row (or a bounded
number, 2 or 3) then your cassandra requests should be more performant. As less machines and
less overall IO will be involved in the request.
Aaron
On 9 Oct 2010, at 08:11, Jason Horman wrote:
> We are currently using EBS with 4 volumes striped with LVM. Wow, we
> didn't realize you could raid the ephemeral disks. I thought the
> opinion for Cassandra though was that the ephemeral disks were
> dangerous. We have lost of a few machines over the past year, but
> replicas hopefully prevent real trouble.
>
> How about the sharding strategies? Is it worth it to investigate
> sharding out via multiple keyspaces? Would order preserving
> partitioning help group indexes better for users?
>
> On Fri, Oct 8, 2010 at 1:53 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>> Two things that can help:
>>
>> In 0.6.5, enable the dynamic snitch with
>>
>> -Dcassandra.dynamic_snitch_enabled=true
>> -Dcassandra.dynamic_snitch=cassandra.dynamic_snitch_enabled
>>
>> which if you are doing a rolling restart will let other nodes route
>> around the slow node (at CL.ONE) until it's warmed up (by the read
>> repairs in the background).
>>
>> In 0.6.6, we've added save/load of the Cassandra caches:
>> https://issues.apache.org/jira/browse/CASSANDRA-1417
>>
>> Finally: we recommend using raid0 ephemeral disks on EC2 with L or XL
>> instance sizes for better i/o performance. (Corey Hulen has some
>> numbers at http://www.coreyhulen.org/?p=326.)
>>
>> On Fri, Oct 8, 2010 at 12:36 PM, Jason Horman <jhorman@gmail.com> wrote:
>>> We are experiencing very slow performance on Amazon EC2 after a cold boot.
>>> 10-20 tps. After the cache is primed things are much better, but it would be
>>> nice if users who aren't in cache didn't experience such slow performance.
>>> Before dumping a bunch of config I just had some general questions.
>>>
>>> We are using uuid keys, 40m of them and the random partitioner. Typical
>>> access pattern is reading 200-300 keys in a single web request. Are uuid
>>> keys going to be painful b/c they are so random. Should we be using less
>>> random keys, maybe with a shard prefix (01-80), and make sure that our
>>> tokens group user data together on the cluster (via the order preserving
>>> partitioner)
>>> Would the order preserving partitioner be a better option in the sense that
>>> it would group a single users data to a single set of machines (if we added
>>> a prefix to the uuid)?
>>> Is there any benefit to doing sharding of our own via Keyspaces. 01-80
>>> keyspaces to split up the data files. (we already have 80 mysql shards we
>>> are migrating from, so doing this wouldn't be terrible implementation wise)
>>> Should a goal be to get the data/index files as small as possible. Is there
>>> a size at which they become problematic? (Amazon EC2/EBS fyi)
>>>
>>> Via more servers
>>> Via more cassandra instances on the same server
>>> Via manual sharding by keyspace
>>> Via manual sharding by columnfamily
>>>
>>> Thanks,
>>> --
>>> -jason horman
>>>
>>>
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>
>
>
> --
> -jason