I have tried to understand embedding in Mongodb but could not find good enough documentation. Linking is not advised as writes are not atomic across documents and also there are two lookups. Does someone know how to solve this or would you suggest me to go to graph dbs like neo4j.

I am trying to build an application which would need many-to-many relationships. To explain, I will take the example of a library. It can suggest books to user based on books his friends are reading and neighbors (like minded) users are reading.

There are Users and Books. Users borrow books and have friends who are other users

Given a user, I need all books he is reading and number of mutual
friends for the book

Given a book, I need all the people who are reading it. May be given
a user A, this would return the intersection of people reading book
and friends of user A. This is mutual friendship

As seen above, generally I need two documents if I take mongo DB as I might two way lookup. Duplicating (embedding) on document into another could lead to lot of duplicity (these schemas could store much more information than shown).

Am I modeling my data correctly? Can this be effectively done in mongodb or should I look at graph dbs.

Thanks for the response Michael. I am using python and I chose MOngoDB because of familiarity and the sharing feature. I am still analyzing graph dbs like neo4j and trying to see if I can get similar performance and sharing ability. If my use case is too trivial for a graph db, then using document store might be easier, what do you say?
–
SaiFeb 15 '12 at 2:07

Your basic schema proposal above would work fine for MongoDB, with a few suggestions:

Use integers for identifiers, rather than strings. Integers will often be stored more compactly by MongoDB (they will always be 8 bytes, whereas strings' stored size will depend on the length of the string). You can use findAndModify to emulate unique sequence generators (like auto_increment in some relational databases) -- see Mongoengine's SequenceField for an example of how this is done. You could also use ObjectIds which are always 12 bytes, but are virtually guaranteed to be unique without having to store any coordination information in the database.

You should use the _id field instead of id, as this field is always present in MongoDB and has a default unique index created on it. This means your _ids are always unique, and lookups by _id is very fast.

You are right that using this sort of schema will require multiple find()s, and will incur network round-trip overhead each time. However, for each of the queries you have suggested above, you need no more than 2 lookups, combined with some straightforward application code:

"Given a user, I need all books he is reading and number of mutual friends for the book"

a. Look up the user in question, thenb. query the books collection using db.books.find({_id: {$in: [list, of, books, for, the, user]}}), thenc. For each book, compute a set union for that book's readers plus the user's friends

"Given a book, I need all the people who are reading it."

a. Look up the book in question, thenb. Look up all the users who are reading that book, again using $in like db.users.find({_id: {$in: [list, of, users, reading, book]}})

"May be given a user A, this would return the intersection of people reading book and friends of user A."

a. Look up the user in question, thenb. Look up the book in question, thenc. Compute the set union of the user's friends and the book's readers

I should note that $in can be slow if you have very long lists, as it is effectively equivalent to doing N number of lookups for a list of N items. The server does this for you, however, so it only requires one network round-trip rather than N.

As an alternative to using $in for some of these queries, you can create an index on the array fields, and query the collection for documents with a specific value in the array. For instance, for query #1 above, you could do:

// create an index on the array field "readers"
db.books.ensureIndex({readers: 1})
// now find all books for user whose id is 1234
db.books.find({readers: 1234})

This is called a multi-key index and can perform better than $in in some cases. Your exact experience will vary depending on the number of documents and the size of the lists.