On Sat, Nov 7, 2009 at 7:09 PM, Michael Bleigh wrote:
> So I've been thinking through the architecture of a Twitter-esque
> system in Couch as a kind of thought exercise to get a better handle
> on some of the more difficult corners of view generation. What would
> be the most effective manner of creating Twitter-like status streams?
>
> My initial feeling is to store the followings of a given user as an
> array in the user's document and also have a view that compiles the
> followers of a given user. When a user posts a status update, the
> application would fetch the follower list from that view and simply
> attach it to the status document. It is then a simply matter of a
> composite key map of a given status document to all of the users
> stored within to create a given user's home timeline.
>
> Where this breaks down is your @aplusk scenario. Storing a 3.5 million
> entry array with a document is obviously going to cripple performance
> (at least I would think it would) as well as take up massive disk
> space (I estimated around 7MB for a single JSON status with 1MM
> followers).
>
> So if this solution isn't scalable to millions of users, what's an
> architecture that would be? How do you compose the user's tweet stream
> such that it can be pulled in an efficient manner?
>
> Just trying to start a discussion to help me better understand
> document-oriented architecture, feel free to ignore me!
>
> Michael Bleigh
>
Michael,
Its hard to give too much of a description of what the best would be
like, but off the cuff after more experience than the last time I made
a comment on the "How does tweetcouch work" meme:
Store each follower relation as a document. Offline when a new tweet
comes in, look at a view that does "emit(person_being_followed,
person_following)" and copy that tweet to the "person_following"'s
stream.
It may seem odd, but if you watch twitter streams closely you can see
that they're actually a pretty good case of "eventually consistent".
It's really noticeable when you're firing back and forth right quick
between 2 or more people. Twitter is an interesting study because even
if you send a tweet, and then 30 seconds later another tweet shows up
as having arrived before you sent yours, humans don't really care. The
async nature is not sensitive as long as we get a notice within
reasonable time. A failing case is the example of getting a text
message three days later. I just realized I'm still typing, so let me
know if that answered anything.
HTH,
Paul Davis