Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

wasimkadak writes with this excerpt from GigaOM: "According to database pioneer Michael Stonebraker, Facebook is operating a huge, complex MySQL implementation equivalent to 'a fate worse than death,' and the only way out is 'bite the bullet and rewrite everything.' Not that it's necessarily Facebook's fault, though. Stonebraker says the social network's predicament is all too common among web startups that start small and grow to epic proportions."

This isn't true. I just migrated an application from MySQL 4.1 to Postgresql 9.0 at work. It took me about two weeks, but certainly not a complete rewrite from scratch. It varies greatly on the application, the language it's written in, frameworks in use, and the number of product specific features in use. This was a perl / mason app.

If an application was making extensive use of stored procedures, then it would require a lot of effort to rewrite those, but not the whole application. If the application were written in C, it would be a lot of work to change. I think facebook uses PHP and that's not too hard to change out especially if they were sane and used an abstraction layer like PDO.

If the app were written in Java or.NET and using an ORM, it would be TRIVIAL to change to another database.

With my experience, the biggest problems were date functions and the fact that MySQL embeds index creation in the create table syntax whereas postgres requires it be separate and the names of indexes are global. This meant that I had some work cut out for me changing index names. There were also a few quirks with some join queries as MySQL is not picky about ordering in the from clause.

You are correct that they'll have to tune queries and things, but it's not a total rewrite if they wrote their app in a reasonable way.

For the record, Postgresql 9 is faster for many of our queries but seems slower doing INSERT. YMMV

Academic purist discovers that one of the most prolific and successful database users in the world is using a system he doesn't approve of. He decides, with no insider knowledge at all, and despite all evidence to the contrary, that they should throw everything away and start over from scratch using a system that he thinks would allow them to see the performance and scalability that they've already achieved.

Right.

Some of the key architects of Facebook have spoken at Stanford about how the system is put together, and I went to that presentation and had a chance to talk to them. They didn't consider MySQL to be a bottleneck. Their big problem was PHP performance. They were writing a PHP compiler to fix that.

Internally, the user-facing side of Facebook is in PHP. But the front end machines don't talk directly to the databases. They use an RPC system to talk to other machines that do the "business logic" parts of the system. Building a Facebook reply page may involve a hundred machines. There's heavy caching all over the system, of course, so the databases aren't hit for most read requests.

The RPC system isn't HTML, JSON, or SOAP. It's a binary system that doesn't require text parsing. Otherwise, RPC would be the bottleneck.

This makes for a flexible, easy to enhance system. New services go in new machines, which talk to existing machines.

Use Postgres.It costs the same as MySQL $0 and is a 100 times the DB.It offers far better data integrity, it supports transactions out of the box, it will handle DBs in the TB range, and is about as standards compliant as DBs get.

The RPC system they're using is Thrift (http://thrift.apache.org/)., which they developed because JSON was becoming a bottleneck. And yeah, there's a metric crapload of memcached in their data centers as well. The multi-hour outage Facebook had late last year was due to a near-complete failure of the memcached layer, resulting in an overload of requests to the main mysql farms.