cassandra-user mailing list archives

This helps a lot.
However, I can't find any API method that actually lets me do a
slice query on a time-sorted column, as necessary for the second blog
example. I get the following error on r789419:
InvalidRequestException: get_slice_from requires CF indexed by name
Evan
On Tue, May 19, 2009 at 8:00 PM, Jonathan Ellis<jbellis@gmail.com> wrote:
> Mail storage, man, I think pretty much anything I could come up with
> would look pretty simplistic compared to what "real" systems do in
> that domain. :)
>
> But blogs, I think I can handle those. Let's make it ours multiuser
> or there isn't enough scale to make it interesting. :)
>
> The interesting thing here is we want to be able to query two things
> efficiently:
> - the most recent posts belonging to a given blog, in reverse
> chronological order
> - a single post and its comments, in chronological order
>
> At first glance you might think we can again reasonably do this with a
> single CF, this time a super CF:
>
> <ColumnFamily ColumnType="Super" ColumnSort="Time" Name="Post"/>
>
> The key is the blog name, the supercolumns are posts and the
> subcolumns are comments. This would be reasonable BUT supercolumns
> are just containers, they have no data or timestamp associated with
> them directly (only through their subcolumns). So you cannot sort a
> super CF by time.
>
> So instead what I would do would be to use two CFs:
>
> <ColumnFamily ColumnSort="Time" Name="Post"/>
> <ColumnFamily ColumnSort="Time" Name="Comment"/>
>
> For the first, the keys used would be blog names, and the columns
> would be the post titles and body. So to get a list of most recent
> posts you just do a slice query. Even though Cassandra currently
> handles large groups of columns sub-optimally, even with a blog
> updated several times a day you'd be safe taking this approach (i.e.
> we'll have that problem fixed before you start seeing it :).
>
> For the second, the keys are blog name<delimiter><post title>. The
> columns are the comment data. You can serialize these a number of
> ways; I would probably use title as the column name and have the value
> be the author + body (e.g. as a json dict). Again we use the slice
> call to get the comments in order. (We will have to manually reverse
> what slice gives us since time sort is always reverse chronological
> atm, but the overhead of doing this in memory will be negligible.)
>
> Does this help?
>
> -Jonathan
>
> On Tue, May 19, 2009 at 11:49 AM, Evan Weaver <evan@cloudbur.st> wrote:
>> Even if it's not actually in real-life use, some examples for common
>> domains would really help clarify things.
>>
>> * blog
>> * email storage
>> * search index
>>
>> etc.
>>
>> Evan
>>
>> On Mon, May 18, 2009 at 8:19 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>> Does anyone have a simple app schema they can share?
>>>
>>> I can't share the one for our main app. But we do need an example
>>> here. A real one would be nice if we can find one.
>>>
>>> I checked App Engine. They don't have a whole lot of examples either.
>>> They do have a really simple one:
>>> http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html
>>>
>>> The most important thing in Cassandra modeling is choosing a good key,
>>> since that is what most of your lookups will be by. Keys are also how
>>> Cassandra scales -- Cassandra can handle effectively infinite keys
>>> (given enough nodes obviously) but only thousands to millions of
>>> columns per key/CF (depending on what API calls you use -- Jun is
>>> adding one now that does not deseriailze everything in the whole CF
>>> into memory. The rest will need to follow this model eventually too).
>>>
>>> For this guestbook I think the choice is obvious: use the name as the
>>> key, and have a single simple CF for the messages. Each column will
>>> be a message (you can even use the mandatory timestamp field as part
>>> of your user-visible data. win!). You get the list (or page) of
>>> users with get_key_range and then their messages with get_slice.
>>>
>>> <ColumnFamily ColumnSort="Name" Name="Message"/>
>>>
>>> Anyone got another one for pedagogical purposes?
>>>
>>> -Jonathan
>>>
>>
>>
>>
>> --
>> Evan Weaver
>>
>
--
Evan Weaver