Schevo and Durus

I saw two presentations on the second day of PyCon which interested me
in combination, one on Schevo and another on Durus. I've always
been reluctant about object databases. ZODB... well, it works, and
I'm sure it works well -- I haven't really had problems -- but it
doesn't make me feel safe.

This isn't because of integrity or stability issues as much as data
stability, portability, upgradability, queriability (is that a word?)
I want to be able to ask questions about my data (like, say, how many
people registered in a certain date range), questions that I didn't
think about when I was setting up my data. And I want to control how
my data changes as my application and the data itself evolves;
generally it should be easy, and it should always be possible --
ALTERTABLE and other operations (like multi-record UPDATE)
mostly do that well in an RDBMS.

Schevo is a fairly restricted object model built on top of a database.
It builds indexes and relations and maintains integrity for you, and
seems to have some conventions to control upgrades. These are all the
aspects of an RDBMS that I care about... well, at least most of the
things I care about.

Schevo is really just the schema -- it builds on top of an object
database (Pypersyst, Durus, or ZODB). Which is where Durus comes into
play -- Durus is kind of a simpler reimplementation of ZODB. The only
real way that ZODB sounded better was that it's threadsafe, so you can
run it in process in a threaded environment. But Durus is
client-server -- like ZEO (the client-server extension to ZODB) -- and
that's good enough for me.

I like an RDBMS for many of its practical advantages -- it's really
good infrastructure. But I'm not in love with them. I want data that
can last for years and across systems; that doesn't have to be an
RDBMS. Right now that's practically the only good option -- XML
persistence is another option, or some other simple flat-file systems,
but those are painful to work with. I still don't feel good about
object databases, but at least this feels like a move in the right
direction.

Created 27 Mar '05

Comments:

I'm trying to wrap my head around some of these issues also. I'm working on two projects now that both use RDBMS backends. One of them seems like a natural for using an OODB. It's a DB of network devices (routers, switches, APs, etc) and I've built my own simple ORM for it (could have used SQLobject, I know!). In many places, I already act like I'm using an OODB. E.g. instead of using crafted SQL to pull specific devices out of the DB, I loop over them all ('SELECT * from devices'), wrap each row up as an object instance, then check attributes on them to find matches (device.Model == 'Router M10'). I'll be examining this one to see how well something like durus could replace MySQL. I imagine it would be quite easy.

OTOH, the second DB is much larger and I think of as containing dumb data. A few well crafted indexes and queries are used to dump out search results as quickly as possible. There's nothing OO about it.

For what it's worth whenever I've dabbled with OODBs I haven't been able to quickly answer questions which involve aggregation, and I always want to do that to a greater or lesser extent. You know, the sort of queries like "Give me all of the transactions in the third quarter last year over five thousand dollars from suppliers in Western Australia"

This is the strength that catapulted relational databases over and above their network and hierarchial competitors in the 70's and 80's. Removing the need to re-organise your data when you want to look at it in a slightly different way made a big difference in the real world where change is the only constant. When object databases crack it (perhaps with a consistent query language) then I confidently predict adoption will skyrocket.

There is a lively thread on this very subject on the db-sig mailing list, look for 'Pycon2005 and database divide' at this page: