Hybrid View

Current design of the database

Folks,

we're trying to use Zimbra in production environment but I'm affraid of some misconceptions (IMHO) on the design of the database. May someone please help me understanding the reasons why you did it that way.

1st: Why there's a database for each user ?

The biggest problem is that Zimbra recommends EXT3 as a filesystem but EXT3 have a limitation of 32,000 hardlinks inside each hardlink and it's required to recompile the EXT3 module to change this number.

Other problem is that you cannot select "all users that" without doing the query on *all* databases, this way you cannot build reports based on users' use of the tool, which is very important to ISPs.

Despite all of that, it's ugly.

I can think on some reasons, but none are good enough to stand against it's problems.
- Scale: you can store N users on each node without changing the code for it.
- Better solution: a table with usernames would do the same with a very little and coherent change in code. Each node could have only it's own set of users on the same table.

- Removal: a drop database would remove the user and all it's data at once.
- Better solution: doing a cascade on delete is not that hard even using MYISAM tables.

2nd: Why MAILITEM table have a metadata field ?

It's very hard to do searches (use of LIKE '%foo%' is not at all recomended), it's hard to understand what some row contains unless you remember what code maps to what "type" and it's not convenient to do JOINs.

Also, you obligate the table to have some extra fields, like "filder_id" which not all rows will fill, thus creating a sparse table, which spends space.

The concept of normalization is not an argument because, if everybody would normalize like that, there's no need of "relational" tables. A relational database is intended to have tables, normaly one for each item, relation and type.

I just can't think why on earth someone would do that, sorry... It's worse for performance, space and it's *very* ugly.

It's not brilliant design. It's a concession to performance. A mid-tier system with 1000 users would create about

20 msgs/day * 365 days/year * 1000 users = 7.3M rows/year

rows in the mail_item tables every year. Even if people are good about cleaning out their mailbox you're still talking about numbers on the order of millions of rows per year. You can't have a data set like that in one table and maintain a high level of concurrency.

Same thing for metadata. If we broke it out, we'd end up with 10x the number of rows in the metadata table as there are in mail_item. So far we haven't come up with a good reason to put metadata in a WHERE clause. We can look into it once a good reason comes up.

If you're concerned about the number of files you can alter my.cnf and turn off innodb_file_per_table.

Newbie burging in

Originally Posted by bburtin

Same thing for metadata. If we broke it out, we'd end up with 10x the number of rows in the metadata table as there are in mail_item. So far we haven't come up with a good reason to put metadata in a WHERE clause. We can look into it once a good reason comes up.
Boris

I dont know.... does X- and other kinds of headers count as meta data?

To control the amount of space used, we only store "interesting" information in the metadata. So we don't blindly stuff all the headers in there. Values that the user may want to query get their own columns in mail_item.