Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

6.
By The Numbers• Over 160 million active application installs use our system across over 80 million unique devices

7.
By The Numbers• Over 160 million active application installs use our system across over 80 million unique devices• Freemium API peaks at 700 requests/second, dedicated customer API 10K requests/second

8.
By The Numbers• Over 160 million active application installs use our system across over 80 million unique devices• Freemium API peaks at 700 requests/second, dedicated customer API 10K requests/second • Over half of those are device check-ins

9.
By The Numbers• Over 160 million active application installs use our system across over 80 million unique devices• Freemium API peaks at 700 requests/second, dedicated customer API 10K requests/second • Over half of those are device check-ins • Transactions - send push, check status, get content

10.
By The Numbers• Over 160 million active application installs use our system across over 80 million unique devices• Freemium API peaks at 700 requests/second, dedicated customer API 10K requests/second • Over half of those are device check-ins • Transactions - send push, check status, get content• At any given point in time, we have ~ 1.1 million secure socket connections into our transactional core

11.
By The Numbers• Over 160 million active application installs use our system across over 80 million unique devices• Freemium API peaks at 700 requests/second, dedicated customer API 10K requests/second • Over half of those are device check-ins • Transactions - send push, check status, get content• At any given point in time, we have ~ 1.1 million secure socket connections into our transactional core• 6 months for the company to deliver 1M messages, just broke 4.2B

31.
A Tale of Storage Engines• PostgreSQL • Bootstrapped the company on PostgreSQL in EC2 • Highly relational, large index model • Layered in memcached • Writes weren’t scaling after ~ 6 months • Continued to use for several silos of data but needed a way to grow more easily

37.
A Tale of Storage Engines• MongoDB • Initially, we loved Mongo • Document databases are cool • BSON is nice • As data set grew, we learned a lot about MongoDB

38.
A Tale of Storage Engines• MongoDB • Initially, we loved Mongo • Document databases are cool • BSON is nice • As data set grew, we learned a lot about MongoDB • “MongoDB does not wait for a response by default when writing to the database.”

39.
A Tale of Storage Engines• MongoDB • Initially, we loved Mongo • Document databases are cool • BSON is nice • As data set grew, we learned a lot about MongoDB • “MongoDB does not wait for a response by default when writing to the database.”

47.
A Tale of Storage Engines• MongoDB - Read/Write Problems • Early days (1.2) one global lock (reads block writes and vice versa) • Later, one read lock, one write lock per server • Long running queries were often devastating • Replication would fall too far behind and stop • No writes or updates • Effectively a failure for most clients

48.
A Tale of Storage Engines• MongoDB - Read/Write Problems • Early days (1.2) one global lock (reads block writes and vice versa) • Later, one read lock, one write lock per server • Long running queries were often devastating • Replication would fall too far behind and stop • No writes or updates • Effectively a failure for most clients • With replication, queries for anything other than the shard key talk to every node in the cluster

53.
A Tale of Storage Engines• MongoDB - Update Problems • Simple updates (i.e. counters) were fine • Bigger updates commonly resulted in large scans of the collection depending on position == heavy disk I/O • Frequently spill to end of the collection datafile leaving “holes” but not sparse files

54.
A Tale of Storage Engines• MongoDB - Update Problems • Simple updates (i.e. counters) were fine • Bigger updates commonly resulted in large scans of the collection depending on position == heavy disk I/O • Frequently spill to end of the collection datafile leaving “holes” but not sparse files • Those “holes” get MMap’d even though they’re not used

55.
A Tale of Storage Engines• MongoDB - Update Problems • Simple updates (i.e. counters) were fine • Bigger updates commonly resulted in large scans of the collection depending on position == heavy disk I/O • Frequently spill to end of the collection datafile leaving “holes” but not sparse files • Those “holes” get MMap’d even though they’re not used • Updates moving data acquire multiple locks commonly blocking other read/write operations

60.
A Tale of Storage Engines• MongoDB - Optimization Problems • Compacting a collection locks the entire collection • Read slave was too busy to be a backup, needed moar RAMs but were already on High-Memory EC2, nowhere else to go • Mongo MMaps everything - when your data set is bigger than RAM, you better have fast disks

61.
A Tale of Storage Engines• MongoDB - Optimization Problems • Compacting a collection locks the entire collection • Read slave was too busy to be a backup, needed moar RAMs but were already on High-Memory EC2, nowhere else to go • Mongo MMaps everything - when your data set is bigger than RAM, you better have fast disks • Until 1.8, no support for sparse indexes

65.
A Tale of Storage Engines• MongoDB - Ops Issues • Lots of good information in mongostat • Recovering a crashed system was effectively impossible without disabling indexes first (not the default)

66.
A Tale of Storage Engines• MongoDB - Ops Issues • Lots of good information in mongostat • Recovering a crashed system was effectively impossible without disabling indexes first (not the default) • Replica sets never worked for us in testing, lots of inconsistencies in failure scenarios

67.
A Tale of Storage Engines• MongoDB - Ops Issues • Lots of good information in mongostat • Recovering a crashed system was effectively impossible without disabling indexes first (not the default) • Replica sets never worked for us in testing, lots of inconsistencies in failure scenarios • Scattered records lead to lots of I/O that hurt on bad disks (EC2)

88.
Cassandra at Urban Airship• Why Cassandra cont’d? • Particularly well suited to working around EC2 availability • Needed a cross AZ strategy - we had seen EBS issues in the past, didn’t trust fault containment w/n a zone

89.
Cassandra at Urban Airship• Why Cassandra cont’d? • Particularly well suited to working around EC2 availability • Needed a cross AZ strategy - we had seen EBS issues in the past, didn’t trust fault containment w/n a zone • Didn’t want locality of replication so needed to stripe across AZs

90.
Cassandra at Urban Airship• Why Cassandra cont’d? • Particularly well suited to working around EC2 availability • Needed a cross AZ strategy - we had seen EBS issues in the past, didn’t trust fault containment w/n a zone • Didn’t want locality of replication so needed to stripe across AZs • Read repair and handoff generally did the right thing when a node would flap (Ubuntu #708920)

91.
Cassandra at Urban Airship• Why Cassandra cont’d? • Particularly well suited to working around EC2 availability • Needed a cross AZ strategy - we had seen EBS issues in the past, didn’t trust fault containment w/n a zone • Didn’t want locality of replication so needed to stripe across AZs • Read repair and handoff generally did the right thing when a node would flap (Ubuntu #708920) • No SPoF

92.
Cassandra at Urban Airship• Why Cassandra cont’d? • Particularly well suited to working around EC2 availability • Needed a cross AZ strategy - we had seen EBS issues in the past, didn’t trust fault containment w/n a zone • Didn’t want locality of replication so needed to stripe across AZs • Read repair and handoff generally did the right thing when a node would flap (Ubuntu #708920) • No SPoF • Ability to alter CLs on a per operation basis

124.
Battle Scars - Ops• Java Best Practices: • All Java services are managed via the same set of scripts

125.
Battle Scars - Ops• Java Best Practices: • All Java services are managed via the same set of scripts • In most cases, operators don’t treat Cassandra different from HBase

126.
Battle Scars - Ops• Java Best Practices: • All Java services are managed via the same set of scripts • In most cases, operators don’t treat Cassandra different from HBase • Simple mechanism to take thread or heap dump

127.
Battle Scars - Ops• Java Best Practices: • All Java services are managed via the same set of scripts • In most cases, operators don’t treat Cassandra different from HBase • Simple mechanism to take thread or heap dump • All logging is consistent - GC, application, stdx

128.
Battle Scars - Ops• Java Best Practices: • All Java services are managed via the same set of scripts • In most cases, operators don’t treat Cassandra different from HBase • Simple mechanism to take thread or heap dump • All logging is consistent - GC, application, stdx • Init scripts use the same scripts operators do

129.
Battle Scars - Ops• Java Best Practices: • All Java services are managed via the same set of scripts • In most cases, operators don’t treat Cassandra different from HBase • Simple mechanism to take thread or heap dump • All logging is consistent - GC, application, stdx • Init scripts use the same scripts operators do • Bare metal will rock your world

130.
Battle Scars - Ops• Java Best Practices: • All Java services are managed via the same set of scripts • In most cases, operators don’t treat Cassandra different from HBase • Simple mechanism to take thread or heap dump • All logging is consistent - GC, application, stdx • Init scripts use the same scripts operators do • Bare metal will rock your world • +UseLargePages will rock your world too

147.
Looking Forward• Cassandra is a great hammer but not everything is a nail

148.
Looking Forward• Cassandra is a great hammer but not everything is a nail• Coprocessors would be awesome (hint hint)

149.
Looking Forward• Cassandra is a great hammer but not everything is a nail• Coprocessors would be awesome (hint hint)• Still spend too much time worrying about GC

150.
Looking Forward• Cassandra is a great hammer but not everything is a nail• Coprocessors would be awesome (hint hint)• Still spend too much time worrying about GC• Glad to see the ecosystem around the product evolving

151.
Looking Forward• Cassandra is a great hammer but not everything is a nail• Coprocessors would be awesome (hint hint)• Still spend too much time worrying about GC• Glad to see the ecosystem around the product evolving • CQL

152.
Looking Forward• Cassandra is a great hammer but not everything is a nail• Coprocessors would be awesome (hint hint)• Still spend too much time worrying about GC• Glad to see the ecosystem around the product evolving • CQL • Pig

153.
Looking Forward• Cassandra is a great hammer but not everything is a nail• Coprocessors would be awesome (hint hint)• Still spend too much time worrying about GC• Glad to see the ecosystem around the product evolving • CQL • Pig • Brisk

154.
Looking Forward• Cassandra is a great hammer but not everything is a nail• Coprocessors would be awesome (hint hint)• Still spend too much time worrying about GC• Glad to see the ecosystem around the product evolving • CQL • Pig • Brisk• Guardedly optimistic about off heap data management