Friday, November 9, 2007

Notes from the Linked-In: Lessons learned and growth and scalability session at QCon with Jean-Luc Vaillant.

Their architecture includes:

Java (trying out some Ruby, adding some C++, as little as possible)

Oracle 10g and MySQL

Spring

ActiveMQ (tried OracleMQ, doesn't recommend it)

Tomcat & Jetty

Lucene

Graph computations don't perform very well in a relationship database: with large numbers of members, and large numbers of connections, the combinatorics can be staggering. Add to this that simple approaches to storing this information would require extensive joining. Best way to get performance was to run the algorithms on the graph in RAM.

That raises the connection of how to keep the RAM database in sync at all times. One option is to update the database and inform other engines of changes through direct RPC, reliable multicast, JMS. This has the typical problems of two-phase commit.

An alternate approach that LinkedIn has used is to log changes in a transaction log which can be pulled from each graph engine into RAM as necessary. The approach is currently Oracle-specific, but it is applicable to just about any database.

Once that's in place, the in-memory techniques for traversing the graph are far less painful. Breadth-first traversal to get connections of various degrees. Using symmetry to find connections from both sides.

Having run into issues with Read-Write Lock, he prefers Copy On Write.