Official Team Blog of RGB Daily

Menu

Startupwindow.com was a brief experiment that I ran a couple years ago. I put on my “Charlie Rose” hat and interviewed a bunch of cool people doing really cool things. Some of those folks included Marissa Mayer, Guy Kawasaki, Damien Katz and Baratunde Thurston to name a few. The site is taken down but the Youtube Channel that powered it will be up for all eternity with any luck: https://www.youtube.com/user/startupwindow

What Steve Jobs achieved in his lifetime is nothing short of amazing. Coming from a blue-collar family and being in the first generation to go to college resonates with me. Be passionate about what you do and never quit. Ideas to live, work and die by. Rest in peace Mr. Jobs.

Open Beta is in full swing and we are thrilled to see the growth in the community. If you haven’t already, please signup to kick the tires on rgbdaily.com. We really appreciate your feedback and ideas! Improvements are rolling out everyday.

Apple is a great American company and I believe they will find their way on the iOS subscription pricing issue. The only true way to be a “lasting” alternative to the open web is to offer price elasticity. Right now Apple has a 30% transaction fee which is way to high. They would be better off lowering the rate to be inline with other payment processors and then leverage float. That definitely isn’t as interesting or lucrative. To be 30% of the size of the market place you create is the stuff of legend. Unfortunately, buys/sellers get screwed and they go somewhere else. The guys at Apple are smart, why do they think they can fight basic market dynamics?

Apple didn’t invent the Appstore. There have been several Java/J2ME type marketplaces for phones over the years. What they did was make the whole user experience silky smooth. Users won’t go back to the dark ages, however Android offers an equal compelling experience now. Even the B&W Kindle offeres an arguably better experience for content acquisition on the device; plus it’s buy once, run anywhere. Amazon has been able to do what it does (content distribution) without pissing everyone off.

I believe Apple is over playing it’s hand. Just because you have arguably the best client endpoint for humans on the network doesn’t mean you are the network. Apple is in worst of a position than AT&T was at the height of it’s monopoly in the 50′s. The switching cost are low across the board. Content providers can pull their wares and move to alternatives like Amazon that are already up and running and treat them well. End users can just move to the hottest Android device on their existing network that is probably outclassing the iPhone 4 on paper.

Taking a step back though, let’s look at everything from Apple’s perspective. It’s clear that application developers have gotten smart and figured out that subscriptions are better business than one time sales. Most significant games requiring subscriptions have in app purchases to support virtual currency which is used to buy an ever growing array of stuff in the game. That stuff is nothing more than content and the game developers are happy to pay the 30% which is less than the 50-80% to acquire a player. So what makes the newspapers so special? Why should they get a free pass?

Rupert Murdoch doesn’t seem to mind greasing the wheels. What does he see that others don’t? Well running News Corp means that he can appreciate scale. Hell, he can charge 99 cents a week and still eventually put ads in the paper. Targeted ads at that. So this is about reaching as many people as possible with reduced friction. Hmmm, I guess the only problem with that is that pesky thing called the Internet; the world wide web specifically. On the web you have more users, more devices, and frankly more opportunities.

What should publishers do?

The simplest strategy for publishers is to continue to support the web and link economy. The browser is becoming increasingly competent and there are partners out there that are willing to help you without stealing the shirt of your back. Google and Facebook will be doubling down on HTML5. In the end go where the users are; there are more PCs and laptops out there than iPads. Also, being age 13-25 demands a keyboard; life is two way and fast paced.

Update:

Yup, Google One Pass is definitely an example of supporting the open web.

I’ll keep this really short. We use Membase in production in a couple of ways:

as a traditonal memcached front end to MySQL to offload read load

for light weight persistent counters

blob store for static pages

We love the mature tools around the memcached interface and also the easy scalability of the Membase platform in general. The fact that in terms of data safety things just work is a big draw. Only one thing is really missing and that was the basic query support. After becoming pros at modeling time series data efficiently across several different NoSQL technologies we finally settled on Redis as the most promising. Mainly because simple scripts could be written against a fairly robust system of data structures to support the business. Unfortunately data safety is something the project is still struggling with and the two man team is just getting started on a cluster solution. Redis just wasn’t the right fit for us. Our working strategy is to port what we could into Membase as a KV store and use MySQL as a index/search server. Couchbase changes all that and brings the query capabilities we are looking for under one umbrella.

Here is a brief excerpt from the Couchbase guys that has us pretty excited:

In addition to the unrivaled performance, reliability and breadth of the Couchbase family, Couchbase will offer the most feature-rich NoSQL database available: the only document database with strict ACID transaction guarantees, multi-point triggers, user code execution across database nodes with scatter-gather support, indexing and query support, database views, real time map-reduce support, immediately consistent (CP) semantics within a datacenter or zone, and eventually consistent (AP) semantics between data centers or zones. The combination of capabilities Couchbase provides will enable the cost-effective development and deployment of applications previously unimaginable.

RGB is pretty much available for primetime running on MySQL EBS and it largly serves us well with Membase as simple Memcached frontend. This works for us because there is an upper limit on the total volume of data/content for the day that is driven by our content publishers. We have other projects in the works though that involve the mobile space and operate at Internet scale where there is no gatekeeper. For those projects Couchbase is a great fit; best in class features, scalability and persistence that just work. I believe in the one true elastic datastore and I’m glad that a proven company is finally stepping up to the plate to make it a reality.

First off, I apologize for my thoughts being all over the place at 3AM in the morning

Redis Persistence

My main point in all this is that it will take some time to reach a 1.0 version of a viable, modern persistence solution in Redis. Time would be better spent now at least evaluating other open source persistent engines before diving head first into a very shallow pool. The main point is that most of the current tools have been battle tested in a number of environments. Antirez could benefit from the extremely hard work of others to simply save bytes to disk.

The main suggested architecture was a write through cache similar to how Membase is currently leveraging SQLite for it’s persistence. Yes the DB powering Zynga. For this architecture Redis could leverage a suite of prepared statements under transactions or write extensions, the equivalent of stored procedures similar to the mysql-sr-lib:arrays I shared.

There are great general purpose datastores trapped behind SQL!

The trappings of SQL are directly proportional to it’s power. As demonstrated in mysql-sr-lib they can be packaged neatly to great effect however to deliver a data structure service embedded in battle tested MySQL/InnoDB running in AWS RDS .

I just meant that query plans cost and are not free. There have been several experiments of late of the amazing performance you can get by working with InnoDB directly instead of going through MySQL for example. Basho is just one of the players that has done work in the space. There is also hook that can be directly embedded in MySQL to bypass the standard workflow and interact directly with the datastore.

I think the number one benefit of Redis was bringing data structures to the head of table in the battle for the one true datastore . That said it can be more than a little painful to watch it go through the normal growing pains associated with a promising project. At RGB, we looked at Redis to solve the activity stream like many before us. After getting this all going in EC2 there was some frustration around the uncertainty of persistence in the face of failures that might occur. We are a shop that already decided that the premium for AWS RDS was enough to remove most of the pain points associated with MySQL. We have fast moving data that grows against the stream of world news. When we learned that Redis VM wasn’t performing at scale in production environments it really spooked us. When antirez said he was looking down the barrel of implementing his own BTree (not even the best solution for modern storage backends) from scratch I started to get upset. Like angry upset. When the news started to float about the filesystem datastore (one file per key) I started to look for other solutions. Redis is great, however we don’t have the operational resources to baby sit any piece of our architecture. Redis works great until it hits the wall; that wall being over 80% of your machine’s RAM. A lot of different solutions are being looked at to remedy Redis, but based on our collective experience on all this stuff few will disagree that it’s early days. I think the best option is for Redis to adopt a pluggable persistence strategy and support an open leader. A couple of solid candidates for embedding are:

Berkley DB

SQLite

InnoDB

What follows are ideas about how maybe we can go the other way and apply the well behaved deterministic data structure operations found in Redis to the databases that are industry proven and do the most important thing really well: read/write data really well.

What else has been done in the space?

In the comments I saw many rock star engineers try to talk antirez off the ledge. Justin Sheehy of Basho pretty much offered up bitcask to solve the persistence problem, but no bite. Don’t get me wrong, I love Redis and the future of Redis Cluster is really bright, but this “not made here” / “our community can wait for our two man team” can only go so far before the Ah Ha moment of Redis is co-opted by other players in the space. This brings me to my first major point. The idea of datastructures in the DB is not new:

Not as fast as Redis, but it does what it needs to do (support large datasets with limited memory while keeping your data safe)

With some minor improvements things could be tuned for massive vertical scale (think RDS and big box environments)

Even simple modulo sharding on the keys could allow the system to better scale on many cpu/core servers by minimizing IO locks. Wih this strategy the simple tables used to support the datasets are replicated to create partitions. Native MySQL hash partitions might also be a good bet, but just having a separate table means the opportunity to shard at the client level opens up. With client side key sharding we can now host partitions on different databases

The libary supports array_merge and could be extended easily for diffs for full set support.

Enter SQLite

So should everyone just move back to MySQL? Hell no, of course not. It’s just proof that antirez has options about where to spend his time. Playing around in the mud with core peristence doesn’t save time. Primary key and single integer secondary key lookups are fast on any database. These types of ranged searches would be limited in nature to just collecting elements of a list. This is fast everywhere. What is slow are the arbitrarily complex index interactions that come with the relational world. Simple data structures make things faster regardless of the technology used in the implementation. So what alternatives do we have for persistence? Given the example above I think SQLite could be a great implementation leveraging the power of stored procedures to implement atomic datasets. Redis waste the RAM it has on core storage instead of using it effectively as a cache and working memory for arbitrarily large datasets (only limited by node resources). Redis should be only concerned with the client, network and caching layers and leave persistence to more mature and capable backends. Redis could make those backends more capable by orchestrating partitions, backups and bulk data moves once Redis Cluster comes online.

SQLite is really only one suggestion. Using BerkleyDB or Innostore could remove the slowness and pain of interacting with the data through SQL. I believe Membase is currently dealing with the pain of slow performance using SQLite once internal cache is exhausted. Indeed, something like Innostore or bitcask might be a better bet.

Alas, the benefits of Redis are clear and co-optable. For example, Basho’s Riak has a great node/link model for representing data. They added a Solr abstraction on top of this model to court users trying to solve the scalable search pain point. What really stops them from adopting a simple data structures interface for the majority of developers who learned how to think that way in their freshman year? 6.001 (Data Structures) was the first CS class I took at MIT.

Both Facebook and Google have an equal interest in making sure the Internet is dominated by the web browser and not closed apps. Neither search or like buttons work in an Apple lead application dominant world. How Facebook approaches the iPad should be very telling; I believe Bret Taylor is signalling a HTML5 client that works on multiple tablets. The current iPhone client is very similar to the mobile site in appearance and functionality.

Facebook can only do so much as a walled garden. The amount of relationships people can effectively manage is around 150. The social graph is dominated by the photos and newsfeed of those core relationships. After the social graph is established things quickly move to the interest graph, which is dominated by content pulled in via the ‘Like’ button. Facebook needs the greater web for not only content, but also commerce.

For Google’s part, you can see quite a few examples of their plans for mobile HTML5 across their product suite. HTML5 is really the only way to future proof and streamline development, both Facebook and Google leverage small teams to get things done. When single developers are responsible for a whole client product it’s tough when that knowledge is not shared by others in the organization. For example, rockstar Joe Hewitt was responsible for Facebook’s iPhone app which stagnated for some time after he went on to Android and other projects. In fact Joe shares an interesting perspective on the the fate of the mobile web and what might happen if developers don’t collectively work to support it.

Over the weekend I presented at Podcamp Boston 5. My talk was on potential SaaS (software as a service) architectures which leverage PaaS (platform as a service) hosting options. It provided a couple alternatives for business founders who are looking to implement a Software as a Service solution. The context is when a technical founder is not present. The presentation covers the smart questions and options that usually are not explored on day one when technical resources are outsourced. The presentation starts out with the question, “Which Mark are You?”