DSpace and Derby?

December 14, 2006

Derby is a Java native RDBMS that is the open source successor to IBM’s cloudscape product. Dan Scott writes that he’s hoping to port DSpace to use Derby instead of PostgreSQL, and reckons that Derby would make a better default db for DSpace.

We’ve had a couple of brief encounters with Hypersonic and Derby here and I’ve been a bit underwhelmed.

The lack of scalability is a problem. Both Hypersonic and Derby start to grind badly over 500,000 rows in a table. This would be a problem for DSpace because DSpace places all metadata statements about items in a single table, and that tends to be the largest table in the database. Consider that each item has O(10) metadata statements and you’re looking at only 50,000 items before Derby / Hypersonic starts grinding on the simplest operations.

I also wonder whether you’re actually getting any advantage in usability. I stopped using Hypersonic in one project recently not because the performance per query was poor, but because initializing the database to run in-process took forever, and consequently my choices were to run it in client-server mode or just use Postgres. Maybe Derby doesn’t have such a naive approach to persisting its state to file, I’ll have to look.

The point is that unless you can run the database in process and have to run it in traditional client-server mode, you’ve got just as many problems in security, configuration and so on as you have with PostgreSQL.

That said – it’s obvious the Derby has legs for the vast majority of web-applications that don’t have large tables or more than one web-app accessing a database, and I’ll definitely be giving it a spin on the next DB backed prototype or play project I have. I’ll be using Hibernate, though, just in case I need to jump ship to Postgres later!