Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

10.
Cassandra + SSDBut can you get such low latency and highthroughput for random reads from disk?With Cassandra + SSDs, YES!(SSD latency is usually only ~100 us)

11.
A Random Note About C* and SSDs ● Cassandra can use cheap consumer grade MLC SSDs (~$1.00 USD / GB) ● no in-place updates results in far fewer erase cycles on the drive which results in the drive lasting longer ● Compared to nearly all other databases, consumer SSDs last ~10x longer on C* ● to put it another way MLC drives last about as long with C* as enterprise SLC drives last with most other databaseshttp://www.anandtech.com/show/5518/a-look-at-enterprise-performance-of-intel-ssds

12.
Why Not On Rotational Disks Too?● rotational disks require ~8ms per seek● note that this is a HW limitation, an absolute upper limit (for that HW)● no system can do better than the seek time when randomly retrieving data from disk (and most do far worse)

13.
What About Writes/Updates?● all write I/O in Cassandra is sequential● no global write lock● no Btrees● compare to MySQL, BerkeleyDB, MongoDB, Oracle, et cetera which either lock (sometimes with a global lock) and/or generate random writes for updates● locking is not the only way to handle concurrency !!!

14.
Larger Than Memory Datasets● write performance degrades only marginally as the dataset outgrows memory; essentially no change in latency or throughput● read performance degrades gracefully and is relative to the percent of data in memory

18.
Measuring ScalabilityIf performance is measured in throughputand latency, then scalability is the stabilityof latency as throughput increases (or thestability of latency and throughput as “load”increases); essentially scalability is howwell a system handles growth

19.
Linear Scalability If for all values of X, Y, Z, C and N:latency=X latency=Xthroughput=Y throughput=Y“load”=Z implies “load”=CZnodes=N nodes=CN Then: the system is perfectly linearly scalable with respect to “load”

20.
Linear Scalability If for all values of X, Y, Z, C and N:latency=X latency=Xthroughput=Y throughput=CY“load”=Z implies “load”=Znodes=N nodes=CNThen: the system is perfectly linearly scalable with respect to throughput

21.
Cassandra Scalability“In terms of scalability, there is a clear winner throughout ourexperiments. Cassandra achieves the highest throughput for themaximum number of nodes in all experiments with [nearly linear]increasing throughput from 1 to 12 nodes.”University of Toronto, CanadaMiddleware Systems Research Group, et al38th International Conference On Very Large Data Baseshttp://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf

30.
Thoughts On Availability (and legacy replication)“High availability implies that a single fault will notbring down your system. Not ‘we’ll recover quickly.’” -- Ben Coverston, DataStax“The biggest problem with failover is that you’realmost never using it until it really hurts. Its likebackups that you never test.” -- Rick Branson, Instagram