There is a Facebook Engineering page that has a lot of this type of information, but not quite what you are asking. You may want to ask there and see if you can get an answer. facebook.com/FacebookEngineering
–
John MeagherJun 17 '09 at 19:42

16 Answers
16

Keep a friend table that holds the UserID and then the UserID of the friend (we will call it FriendID). Both columns would be foreign keys back to the Users table.

Somewhat useful example:

Table Name: User
Columns:
UserID PK
EmailAddress
Password
Gender
DOB
Location
TableName: Friends
Columns:
UserID PK FK
FriendID PK FK
(This table features a composite primary key made up of the two foreign
keys, both pointing back to the user table. One ID will point to the
logged in user, the other ID will point to the individual friend
of that user)

This will show that Bob is friends with both Jon and Joe and that Jon is also friends with Joe. In this example we will assume that friendship is always two ways, so you would not need a row in the table such as (2,1) or (3,2) because they are already represented in the other direction. For examples where friendship or other relations aren't explicitly two way, you would need to also have those rows to indicate the two-way relationship.

think of how inefficient this is though - you have to do a disjunctive query on the columns of the many-to-many, doubling search time on average.
–
Anthony BishopricApr 4 '11 at 18:29

2

Personally, I wouldn't want those two fields to make a composite primary key. A unique key, absolutely. The clustered index on that unique key, definitely. But I'd also put some sort of non-composite identity as the PK with a nonclustered index. That would allow other tables that need a "friend relationship ID" FK to easily tie to this table and various triggers could fire to cascade events of friending, defriending, etc.
–
Jesse C. SlicerMay 14 '12 at 22:05

It said that Facebook has around 1'000'000'000 users. If the average user has 100 friends, that means the table would contain 100'000'000'000 rows. MySQL partitioning?
–
veidelisJun 4 '14 at 7:30

Forget this approach. If you get a serious amount of users it will definitely become very slow. See my answer and try the to benchmark it yourself. I've done some benchmarking with 10k users and 2.5 million friendship connections and the result was disappointing. If you run a small community it will work fine but there are performance issues to consider.
–
burzumMar 21 at 9:44

I have a feeling you're going to have to explain that a bit more for some people here.
–
TheTXIJun 17 '09 at 19:22

3

I think a more interesting question would be how to persist such a huge structure (we're talking about 200 million nodes and billions of edges) in a way that it can easily be searched and updated.
–
Dirk Vollmar - 0xA3Jun 17 '09 at 19:42

@divo: clever use of indexes and partitions.
–
belgarionthekingJun 19 '09 at 15:23

While this certainly makes the most sense, I would think the performance would be horrendous given how many users Facebook has and how many friends each Facebook user has.
–
Kevin PangJun 17 '09 at 21:42

You'll see that these companies are dealing with data warehouses, partitioned databases, data caching and other higher level concepts than most of us never deal with on a daily basis. Or at least, maybe we don't know that we do.

There are a lot of links on the first two articles that should give you some more insight.

Its not possible to retrive datas from rdbms for user friends datas for datas which crossed more than half a billion at a constant time
so facebook implemented this using hash database(no sql) and they opensourced the database called cassandra
so every user has its own key and the friends details in a queue
to know how cassandra works look at this
http://prasath.posterous.com/cassandra-55

However, this is probably far from what you would find in reality on Facebook's servers. They claim to have >200 million users and we can easily guess that the number of FriendRelations will be in the range of billions. So this will need a highly optimized schema and database engine probably only the people working at Facebook will know.

Probably there is a table, which stores the friend <-> user relation, say "frnd_list", having fields 'user_id','frnd_id'.

Whenever a user adds another user as a friend, two new rows are created.

For instance, suppose my id is 'deep9c' and I add a user having id 'akash3b' as my friend, then two new rows are created in table "frnd_list" with values ('deep9c','akash3b') and ('akash3b','deep9c').

Now when showing the friends-list to a particular user, a simple sql would do that: "select frnd_id from frnd_list where user_id="
where is the id of the logged-in user (stored as a session-attribute).

They use a stack architecture with cached graphs for everything above the MySQL bottom of their stack.

Long Answer:

I did some research on this myself because I was curious how they handle their huge amount of data and search it in a quick way. I've seen people complaining about custom made social network scripts becoming slow when the user base grows. After I did some benchmarking myself with just 10k users and 2.5 millionen friend connections - not even trying to bother about group permissions and likes and wall posts - it quickly turned out that this approach is flawed. So I've spent some time searching the web on how to do it better and came across this official Facebook article:

I really recommend you to watch the presentation of the first link above before continue reading. It's probably the best explanation of how FB works behind the scenes you can find.

The video and article tells you a few things (updated):

They must have somehow implemented a graph in MySQL that performs very well

They're using MySQL at the very bottom of their stack

Above the SQL DB there is the TAO layer which contains at least two levels of caching and is using graphs to describe the connections.

I could not find anything on what software / DB they actually use for their cached graphs

Let's take a look at this, friend connections are top left:

Well, this is a graph. :) It doesn't tell you how to build it in SQL, there are several ways to do it but this site has a good amount of different approaches.

Also consider that you have to do more complex queries than just friends of friends, for example when you want to filter all locations around a given coordinate that you and your friends of friends like. A graph is the perfect solution here.

I can't tell you how to build it so that it will perform well but it clearly requires some trial and error and benchmarking.

I really recommend you to create you some sample data with at least 10k user records and each of them having at least 250 friend connections and then run this query. On my machine (i7 4770k, SSD, 16gb RAM) the result was ~0.18 seconds for that query. Maybe it can be optimized, I'm not a DB genius (suggestions are welcome). However, if this scales linear you're already at 1.8 seconds for just 100k users, 18 seconds for 1 million users.

This might still sound OKish for ~100k users but consider that you just fetched friends of friends and didn't do any more complex query like "display me only posts from friends of friends + do the permission check if I'm allowed or NOT allowed to see some of them + do a sub query to check if I liked any of them". You want to let the DB do the check on if you liked a post already or not or you'll have to do in code. Also consider that this is not the only query you run and that your have more than active user at the same time on a more or less popular site.

I think my answer answers the question how Facebook designed their friends relationship very well but I'm sorry that I can't tell you how to implement it in a way it will work fast. Implementing a social network is easy but making sure it performs well is clearly not - IMHO.

I've started experimenting with OrientDB to do the graph-queries and mapping my edges to the underlying SQL DB. If I ever get it done I'll write an article about it.

My guess would be something along the lines of a large key-value store for speed. This probably isn't what you'd be doing for a smaller site, as it makes things a lot more complex. For example, there would be something along the lines of:

Well thinking about Oracle, I've heared that is used for military purpose too... Maybe it has something to deal with a lot of datas too. I think is a big advantage, for facebook, to use a relational database instead of a non relatinal.... but obviusly this is my way of thinking