Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

I'm looking for the best way to model my new problem: I have many systems which send to the DB a set of metrics (ex: system1 sends every 10 secondes, temp1 - temp2 - acc1 - vel1 - ...)

When I see your 9th slide, I wonder if it's the best way to store the values, as the systems don't have the same metrics, and maybe the metrics can changes over the time (ex: tomorrow, the system1 won't have the temp2, but have acc2 and acc3).

Cassandra NYC 2011 Data Modeling

1.
Data Modeling ExamplesMatthew F. Dennis // @mdennis

2.
Overview● general guiding goals for Cassandra data models● Interesting and/or common examples/questions to get us started● Should be plenty of time at the end for questions, so bring them up if you have them !

3.
Data Modeling Goals● Keep data queried together on disk together● In a more general sense think about the efficiency of querying your data and work backward from there to a model in Cassandra● Usually, you shouldnt try to normalize your data (contrary to many use cases in relational databases)● Usually better to keep a record that something happened as opposed to changing a value (not always the best approach though)

4.
Time Series Data● Easily the most common use of Cassandra ● Financial tick data ● Click streams ● Sensor data ● Performance metrics ● GPS data ● Event logs ● etc, etc, etc ...● All of the above are essentially the same as far as C* is concerned

5.
Time Series Thought Model● Things happen in some timestamp ordered stream and consist of values associated with the given timestamp (i.e. “data points”) – Every 30 seconds record location, speed, heading and engine temp – Every 5 minutes record CPU, IO and Memory usage● We are interested in recreating, aggregating and/or analyzing arbitrary time slices of the stream – Where was agent:007 and what was he doing between 11:21am and 2:38pm yesterday? – What are the last N actions foo did on my site?

6.
Data Points Defined● Each data point has 1-N values● Each data point corresponds to a specific point in time or an interval/bucket (e.g. 5 th minute of 17th hour on some date)

7.
Data Points Mapped to Cassandra● Row Key is id of the data point stream bucketed by time – e.g. plane01:jan_2011 or plane01:jan_01_2011 for month or day buckets respectively● Column Name is TimeUUID(timestamp of date point)● Column Value is serialized data point – JSON, XML, pickle, msgpack, thrift, protobuf, avro, BSON, WTFe● Bucketing – Avoids always requiring multiple seeks when only small slices of the stream are requested (e.g. stream is 5 years old but Im on only interested in Jan 5 th 3 years ago and/or yesterday between 2pm and 3pm). – Make it easy to lazily aggregate old stream activity – Reduces compaction overhead since old rows will never have to be merged again (until you “back fill” and/or delete something)

10.
Querying● When querying, construct TimeUUIDs for the min/max of the time range in question and use them as the start/end in your get_slice call● Or use a empty start and/or end along with a count

11.
Bucket Sizes?● Depends greatly on ● Average size of time slice queried ● Average data point size ● Write rate of data points to a stream ● IO capacity of the nodes

12.
So... Bucket Sizes?● No Bigger than a few GB per row ● bucket_size * write_rate * sizeof(avg_data_point)● Bucket size >= average size of time slice queried● No more than maybe 10M entries per row● No more than a month if you have lots of different streams● NB: there are exceptions to all of the above, which are really nothing more than guidelines

13.
Ordering● In cases where the most recent data is the most interesting (e.g. last N events for entity foo or last hour of events for entity bar), you can reverse the comparator (i.e. sort descending instead of ascending) ● http://thelastpickle.com/2011/10/03/Reverse-Comparators/ ● https://issues.apache.org/jira/browse/CASSANDRA-2355

14.
Spanning Buckets● If your time slice spans buckets, youll need to construct all the row keys in question (i.e. number of unique row keys = spans+1)● If you want all the results between the dates, pass all the row keys to multiget_slice with the start and end of the desired time slice● If you only want the first N results within your time slice, lowest latency comes from multiget_slice as above but best efficiency comes from serially paging one row key at a time until your desired count is reached

15.
Expiring Streams (e.g. “I only care about the past year”)● Just set the TTL to the age you want to keep● yeah, thats pretty much it ...

16.
Counters● Sometimes youre only interested in counting things that happened within some time slice● Minor adaptation to the previous content to use counters (be aware they are not idempotent) ● Column names become buckets ● Values become counters

18.
Eventually Atomic● In a legacy RDBMS atomicity is “easy”● Attempting full ACID compliance in distributed systems is a bad idea (and actually impossible in the strictest sense)● However, consistency is important and can certainly be achieved in C*● Many approaches / alternatives● I like a transaction log approach, especially in the context of C*

19.
Transaction Logs (in this context)● Records what is going to be performed before it is actually performed● Performs the actions that need to be atomic (in the indivisible sense, not the all at once sense which is usually what people mean when they say isolation)● Marks that the actions were performed

21.
Configuration Details● Short gc_grace_seconds on the XACT_LOG Column Family (e.g. 5 minutes)● Write to XACT_LOG at CL.QUORUM or CL.LOCAL_QUORUM for durability ● if it fails with an unavailable exception, pick a different node token and/or node and try again (gives same semantics as a relational DB in terms of knowing the state of your transaction)

22.
Failures● Before insert into the XACT_LOG● After insert, before actions● After insert, in middle of actions● After insert, after actions, before delete● After insert, after actions, after delete

23.
Recovery● Each C* has a crond job offset from every other by some time period● Each job runs the same code: multiget_slice for all node tokens for all columns older than some time period (the “recovery period”)● Any columns need to be replayed in their entirety and are deleted after replay (normally there are no columns because normally things are working)

24.
XACT_LOG Comments● Idempotent writes are awesome (thats why this works so well)● Doesnt work so well for counters (theyre not idempotent)● Clients must be able to deal with temporarily inconsistent data (they have to do this anyway)