@CompSciFact: "Most software looks more like a whirlpool than a pipeline." #gotoaar

Peter Hunt: My experience with SSDs and Zookeeper has been discouraging. SSDs have some really terrible corner cases for latency. I've seen them take 40+ seconds (that's not a mistake - seconds) for fsync to complete.

What We Found Scanning Millions of Email Systems: We found that, on average, it took 0.3 second to establish a connection, and 1.4 seconds to complete an SMTP transaction < So why do we need a new messasge bus again when email works just fine?

Scalability of CFEngine 3.3.5 and Puppet 2.7.1: At 50 hosts with just the Apache configuration, CFEngine agents run 20 times faster than Puppet agents. At 300 hosts, with 200 echo commands, CFEngine agents run 166 times faster than Puppet agents. Puppet’s architecture is heavily centralised. Puppet runs under the Ruby interpreter. < As one commenter says, CPU is cheap so performance isn't everything.

Monitoring and observability in Complex Architectures. Theo Schlossnagle is our modern day Virgil leading us through the ring of Hell devoted to evil optimization problems by the light of effective trouble shooting. Some rules to follow on your trip: Direct observation of failure leads to quicker rectification and You cannot correct what you cannot measure. Have fun.

“INSIDE STORY” ABOUT WHAT HAPPENED AT GODADDY.COM. An undefined event caused a service disruption. Router memory limits were exceeded which caused switching to move to software and as most routers are equivalent to a high-end toaster, the CPU was overwhelmed. Routers in software switching mode could not forward incoming and outgoing DNS traffic fast enough. Cache timeouts caused a large spike in DNS queries. Retries increased. Eventually service was restored by restoring the routing table and throttling the DNS queries with traffic rate-limiters on all of our Internet connection points around the world. Also, Scaling the Database: Database Types.

Five Myths about Hash Tables. Hugh Williams explores is minor obsession with Hash Tables in a most interesting blog post. The myths: the worst case is terrible (not in practice with a good hash function); when hash tables become full bad things happen (use chained hash tables); trees are better (they are for inexact matches, for exact matches hash tables are better, scoreboard); hash functions are slow (use a better one); hash tables use too much memory (they use the same as a tree).

Linux TCP/IP tuning for scalability: After changing all these settings, a single quad core vm (though using only one core) with 1Gig of RAM has been able to handle all the load that’s been thrown at it. We never run out of open file handles, never run out of ports, never run out of connection tracking entries and never run out of RAM. < Lots of great knobs to twist in your own systems.

Latency: The New Web Performance Bottleneck. Ilya Grigorik makes a great point: when it comes to your web browsing experience, it turns out that latency, not bandwidth, is likely the constraining factor today. If we can't make the bits travel faster, then the only way to improve the situation is to move the bits closer: place your servers closers to your users, leverage CDN's, reuse connections where possible (TCP slow start), and of course, no bit is faster than no bit - send fewer bits.

Read latency issue. Good Google Groups discussion on improving latency with Cassandra. Some suggestions: If you want fast performance specify the full row key; You could try for a non composite key or supply all the components of the key in the search; The best performance we got is with disabled row and key caches (no additional GC pressure) and reduced heap size to 3.5GB. Our average read latency is 2ms, 99% fit in 10ms (1K writes/updates/deletes and 1K reads per Cassandra node per second).

Lessons Learned from a Redis Outage at Yipit. A server outage was caused by running out of memory which was caused by a lack of disk space which was caused by deleting a lot of keys to make for more room. Some solutions: Redis keys are not forever; Namespace all the Keys, Use Separate Character Sets for the Static and Dynamic Components of Key Names; Watch Your Disk Space.

Rethinking caching in web apps. Martin Kleppmann with a long and thoughtful discussion on the role for caching in data-heavy applications. The goal at the end of a great set of points: My hope is to make caching and cache invalidation as simple as database indexes. You declare an index once, the database runs a one-off batch job to build the index, and thereafter automatically keeps it up-to-date as the table contents change. It’s so simple to use that we don’t even think about it, and that’s what we should be aiming for in the realm of caching.

Open Rack 1.0 Specification Available Now. Open hardware for the win. Especially like: The Open Rack is more than just a server cabinet; it’s also an abstraction layer between the server and the rack, like a hardware API.

"Shimon Schocken and Noam Nisan developed a curriculum for their students to build a computer, piece by piece. When they put the course online -- giving away the tools, simulators, chip specifications and other building blocks -- they were surprised that thousands jumped at the opportunity to learn, working independently as well as organizing their own classes in the first Massive Open Online Course (MOOC). A call to forget about grades and tap into the self-motivation to learn."

Reader Comments (1)

Todd,

+1 on "why do we need a new messaging bus again?!" You hit the nail on the head. Why do we need a new service discovery mechanism? Pick DNS and the protocol over both UDP and TCP, caching, TTL's, client software, etc comes along for free! Why a new messaging/IM/service bus protocol, when we've got 40 years into SMTP and it's a plain-text protocol that everything gets along with, with buffering, retries, etc? Why a new tiny message protocol with all the great work that rsyslog is doing with RELP and TLS syslog and the new JSON integration? Or virtually anything else, when HTTP + REST fills in virtually all the other gaps?

I think that a lot of people like to be cool and reinvent the wheel, but if we can start to come up with new uses for the old (but still solid) code, we're all better off. Need a fast highly distributed lookup mechanism that's already proven at Internet scale? DNS works great and is extremely fast. Need a search engine for plain text files? Drop them in a Dovecot IMAP server and take advantage of built-in indexing on plain-text files, along with built-in metadata searches as well. Not that HyperEstraier or Solr (etc) aren't great, but sometimes we just jump on newer, less mature systems too fast and miss out on the stuff that's been quietly chugging along for decades.