28.
Operations in Cassandra 1.0
How to get a list
of blogs by “mau”?
WHERE user = ‘mau’
post[‘uuid’][‘title’] = ‘First post!’;
Bad Request:
No indexed columns present in
post[‘uuid’][‘user’] = ‘mau’;
by-columns clause with
user[‘mau’][‘ﬁrstname’] = ‘Maurits’;
Equal operator
Bad Request: Order by is currently only supported
on the clustered columns of the PRIMARY KEY
•
•
•
dinsdag 12 november 13

29.
Operations in Cassandra 1.0
How to get a list
of blogs by “mau”?
WHERE user = ‘mau’
post[‘uuid’][‘title’] = ‘First post!’;
Bad Request:
No indexed columns present in
post[‘uuid’][‘user’] = ‘mau’;
by-columns clause with
user[‘mau’][‘ﬁrstname’] = ‘Maurits’;
Equal operator
Bad Request: Order by is currently only supported
on the clustered columns of the PRIMARY KEY
Bad Request: ORDER BY is only supported when the partition key is
restricted by an EQ or an IN.
•
•
•
dinsdag 12 november 13

40.
Beauty?
•
•
•
•
dinsdag 12 november 13
Dirty in the SQL world, but;
It’s a best practice in Big Data
Don’t think of it as a relational database
No strict rules on how to use it, just push it to the limits

70.
Eventual consistency
Not guaranteed to be consistent, but becomes consistent later
dinsdag 12 november 13

71.
Eventual consistency
•
•
Best effort
•
Conﬁgurable consistency level, but no transaction support
dinsdag 12 november 13
Consistency is not always more important than speed and scalability
(doesn’t require locking)

115.
Hack your own
•
•
•
•
dinsdag 12 november 13
Not too difﬁcult
Data can be split into subsets by ﬁltering on tokens
Application must run on all MapRed nodes
Probably better performance than Pig / Hive

124.
Cassandra does not ﬁt all
(same story for every NoSQL solution)
dinsdag 12 november 13

125.
Every page (or API call) should only
require a few (if not one) query
dinsdag 12 november 13

126.
Static versus Dynamic data
•
Static: information that doesn’t change very often
•
•
•
I.e.: translations
May go in a RDBMS or local storage (ﬁles?)
Dynamic: many changes
•
•
dinsdag 12 november 13
Changes must be visible on all nodes
Use Cassandra

127.
Local versus Global data
•
Logging
•
•
Separate logs per node
Cache
•
•
Sometimes no need to share cache between nodes
Statistics
•
dinsdag 12 november 13
Can be kept local for a limited time

129.
Caching
•
•
Memcache is recommended for local cache
Cassandra can be used for global cache
•
dinsdag 12 november 13
Has a TTL feature
INSERT INTO ... (...) VALUES (...) USING TTL 86400

130.
What about ﬁles?
•
dinsdag 12 november 13
Use Hadoop Distributed File System (HDFS) or GlusterFS

131.
What about ﬁles?
•
•
dinsdag 12 november 13
Use Hadoop Distributed File System (HDFS) or GlusterFS
Or use Cassandra

132.
What about ﬁles?
•
•
Split ﬁles in chunks to avoid hotspots and save the heap
Not uncommon to have ﬁles in Cassandra
•
•
dinsdag 12 november 13
github.com/Netﬂix/astyanax
GB’s are ok, but do not store TB’s