Suppose you have a large number of users (M) and a large number of documents (N) and you want each user to be able to mark each document as read or unread (just like any email system). What's the best way to represent this in MongoDB? Or any other document database?

There are several questions on StackOverflow asking this question for relational databases but I didn't see any with recommendations for document databases:

Typically the answers involve a table listing everything a user has read: (i.e. tuples of user id, document id) with some possible optimizations for a cut off date allowing mark-all-as-read to wipe the database and start again knowing that anything prior to that date is 'read'.

So, MongoDB / NOSQL experts, what approaches have you seen in practice to this problem and how did they perform?

If all you need is read/unread you could use this with MongoDB's upsert capabilities, so you are not creating prefs for each message unless the user actually reads it, then basically you create the prefs object with your own unique id and upsert it into MongoDB. If you want more flexibility(like say tags or folders) you'll probably want to make the pref for each recipient of the message. For example you could add:

tags: ['inbox','tech stuff']

to the prefs object and then to get all the prefs of all the messages tagged with 'tech stuff' you'd go something like:

It might be a little tricky if you want to do something like counting how many messages each 'tag' contains efficiently. If it's only a handful of tags you can just add .count() to the end of your query for each query. If it's hundreds or thousands then you might do better with a map/reduce server side script or maybe an object that keeps track of message counts per tag per user.

Thanks, so your recommendation is essentially the same kind of 'tuple/join' table as the relational case, right? Any particular reason you store both the messages and the prefs in the same collection?
–
Ian MercerNov 15 '10 at 3:11

The thing with MongoDB is that usually the flatter you can make your object the better. While it can store nested structures it's not the best at querying or getting into those structures later to alter them. So a lot of stuff may end up looking similar to a relational, but with less abstraction due to not using tables. Also there is really no reason I store them in the same collection other than not liking to have a bazillion collections. If you do plan on having millions of messages it might be wise to use different collections so that you can setup the indexes to fit each object better.
–
KlinkyNov 15 '10 at 4:46