It's a little difficult to look at a comment about Facebook being a toy application and take it seriously. Yes, Facebook is not directly processing bank transactions on a Tandem, but their site is used to conduct business -- and is even the basis for many businesses, all over the world.

Zynga, the company that makes a few annoying games for Facebook, is worth $15 -- more than Electronic Arts.

Nearly every major online publisher, including Anandtech, uses their API for content distribution and often as the entire forum system for discussion of publications.

The founder is the youngest billionaire in history.

Calling theirs a toy application sounds like a Blockbuster customer calling Redbox a toy. It's denial of an obviously successful, large, powerful, innovative company because they don't do things "the old way."

I suspect what matters more is that the business is executing flawlessly, the actual problems with data loss or other non-ACID compliant traditional issues are minimal, and that they are making enough money that Google and Microsoft are feel seriously threatened.

One last thing -- if you really look into what ACID compliance means (and I know you did not specifically mention the acronymn, but replied to someone that did) none of the current major DBMS's are truly ACID compliant. It's too slow. Not Oracle. Not MSSQL. Not Greenplum. Not Teradata. None of them. They may be closer than NoSQL or the like, but then it's all about the right tool for the job, right?Reply

This is true but ACID can be over-rated for many workloads. How many pieces of data HAVE to be consistent across the entire cluster to be valid? What about NoSQL with configurable consistency like Cassandra?

NoSQL databases provide the holy grail of system growth which is horizontal scaling and this is no small thing for anyone who has worked with a very large RDBMS like ORACLE and implemented RAC to find it doesn't scale all that linearly for most workloads.Reply

It is an ATI ES 1000, that is a server/thin client chip. That chip is only 2D. I can not find the power specs, but considering that the chip does not even need a heatsink, I think this chip consumes maybe 1W in idle. Reply

ES 1000 is almost the same as radeon 7000/ve (no that's not HD 7000...) (some time in the past you could even force 3d in linux with the open-source driver though it usually did not work). The chip also has dedicated ram chip (though only 16bit wide memory interface) and I'm not sure how good the powersaving methods of it are (probably not downclocking but supporting clock gating) - not sure if it fits into 1W at idle (but certainly shouldn't be much more).Reply

I can not find any good tech resources on the chip, but I can imagine that AMD/ATI have shrunk the chip since it's appearance in 2005. If not, and the chip does consume quite a bit, it is a bit disappointing that server vendors still use it as the videochip is used very rarely. You don't need a videochip for RDP for example. Reply

I think the possibility of this chip being shrunk since 2005 is 0%. The other question is if it was shrunk from rv100 or if it's actually the same - even if it was shrunk it probably was a well mature process like 130nm in 2005 otherwise it's 180nm.At 130nm (estimated below 20 million transistors) the die size would be very small already and probably wouldn't get any smaller due to i/o anyway. Most of the power draw might be due to i/o too so shrink wouldn't help there neither. It is possible though it's really below 1W (when idle).Reply

There's a limit how much i/o you can have for a given die size (actually the limiting factor is not area but circumference so making it rectangular sort of helps). i/o pads apparently don't shrink well hence if your chip has to have some size because you've got too many i/o pads a shrink will do nothing but make it more expensive (since smaller process nodes are generally more expensive per area).Being i/o bound is quite possible for some chips though I don't know if this one really is - it's got at least display outputs, 16bit memory interface, 32bit pci interface and the required power/ground pads at least.In any case even at 180nm the chip should be below 40mm² already hence the die size cost is probably quite low compared to packaging, cost of memory etc.Reply

It's the integrated BMC/ILO solution which also includes a GPU that would use more power then the ES1000 any how. That is also what is lacking in the simple Google / Facebook compute-node setup. They don't need that kind off management and can handle that a node goes offline.Reply

I would definitely have to agree with you on this notion. HP servers are pretty expensive when you take into account 3 year warranties and 24/7 replacement options that going with a open compute server is a nice alternative to the "I can do everything" server. Better to stick to something you can do pretty well and efficiently than I can do many things poorly.Reply

I would be quite interested how they determined Java and C# are 2/3x slower than C++. Since it seems pretty non-corresponding with reality to me. I have seen a few tests C++ vs. Java and the differences were in matter of %. As well as C# in my experience does the same jobs little bit faster than Java and the benchmark results generally confirm it.few links:

I'm not surprised that part of the article would lead to programming language holy wars, but general benchmarks are utterly useless for Facebook. They should (and surely do) care only about performance of the compiled code and hardware platforms that run the site.Reply

It's illogical to suggest that an interpreted language like Java or C# could ever approach C++ in speed when the same level of optimization is applied to each.

In my experience, the least optimized C++ code can sometimes be approximated in performance by the best optimized Java code, depending on the task in question.

Of course, once you spend time optimizing the C++ code then there is no way for Java to keep up.

I have never used C# but I expect the result for it would be very similar to Java due to the similar mechanics of the language implementation.

That being said, in many situations raw speed is not the most important factor, and Java and C# can have significant advantages in terms of mechanism of deployment, programmer productivity, etc, that can make those languages very much the best choice in some situations; which is why they are, in fact, used in those situations in which their advantages are best exploited and their weaknesses are least important.

I think that Ruby takes the last paragraph even further; Ruby is so ungodly slow that it has to make up for it by allowing extreme productivity gains, and I expect that it must (I've never programmed in it to any significant extent), otherwise it wouldn't have any niche at all.Reply

I agree that in some cases a JIT compiler can produce more efficient code, particularly when the application lends itself to runtime optimizations, however that is far from typical. Usually, for a single process, the JIT code, once compiled, will be reasonably close, though the static C/C++ code has the edge.

But that is for the typical case. Facebook is not a typical case. Each web server is constantly starting many, many short-lived processes. Each process must start up its own copy of the code. This is where JIT fails badly to ahead-of-time compilation. It isn't the execution speed of the code after the JIT gets it compiled. The problem is the startup delay. Even with caching, the bytecode still must be compiled at least once for each new process, which in Facebook's case is millions of times. There is no such delay with ahead-of-time compilation. Therefore, Java and C# have no chance of competing in Facebook's environment.Reply

i wonder whether power consumption justifies them to create a new hardware w/ green power architecture and the cost they spend to having a custom build power supply running on 270volt, if it's only saves about 10-20 percent average of power consumption, rather than lets say make a corporate deal to the best power/performance servers producer on the market and modified it with water cooling (for example)???Reply

Just looking at the final image in the article there are easily 30 racks of 30 servers visable (30 x 30 x 17.50 =) $15 750/year in power saving.

Since most power going into a computer ends up as wasted heat, if the 900 servers (from above) were consuming the additioanl 20W this would be ~18KW of additional heat being produced which needs to be cooled. This offers additional operational and capital cost savings due to the smaller cooling requirements.

Water cooling may be a more efficient way of pulling heat out of the server rack, but the additional parts to move the water around the facility and to cool it adds to the total costs. Water is more efficient because it carries more heat/volume than air and with the piping the heat can be taken outside of the server room, while fans heat the air around the servers where another method of removing the heat is then required.

The custom power supply at 270V and custom motherboard aren't really that difficult to get, as so many makers of each part already do custom designs for major PC makers (Dell/HP/etc). The difference between 208v and 270v from an electrical design standpoint isn't a big change, neither is removing parts from a motherboard.

In short it's the economy of scale. You or I wouldn't be able to do this for a dozen personal systems as the costs would be huge per system, on the other hand for anyone managing 1'000's of servers the 20W/per adds up quick.Reply

Hundreds or thousands of times more is more likely. FB's grown to the point of building its own data centers instead of leasing space in other peoples. Large data centers consume multiple megawatts of power. At ~100W/box, that's 5-10k servers per MW (depending on cooling costs); so that's tens of thousands of servers/data center and data centers scattered globally to minimize latency and traffic over longhaul trunks.Reply

When you've got thousands of these things in a building, consuming untold MW, you'd kill your own grandmother for half that savings. And water cooling doesn't save any energy at all—it's simply an expensive and more complicated way of moving heat from one place to another.

For those unfamiliar with it, 480 VAC three-phase is a widely used commercial/industrial voltage in USA power systems, yielding 277 VAC line-to-ground from each of its phases. I'd bet that even those light fixtures in the data center photo are also off-the-shelf 277V fluorescents of the kind typically used in manufacturing facilities with 480V power. So this isn't a custom power system in the larger sense (although the server level PSUs are custom) but rather some very creative leverage of existing practice.

Remember also that there's a double saving from reduced power losses: first from the electricity you don't have to buy, and then from the power you don't have to use for cooling those losses.Reply

No. the most important rule is that the warm air of one heatsink should not enter the stream of cold air of the other. So placing them next to each other is the best way to do it, placing them serially the worst.

Placing them further apart will not accomplish much IMHO. most of the heat is drawn away to the back of the server, the heatsinks do not get very hot. You also lower the airspeed between the heatsinks.Reply

"The next piece in the Facebook puzzle is that the Open Source tools are Memcached."

In fact, the tools are not memchached. Instead, software objects from the PHP/c++ stack, programmed by the engineers, are stored in Memcached. Side note - those in the know pronounce it "mem-cache-dee", emphasizing with the last syllable that it is a network daemon. (similar to how the DNS server "bind" is pronounced "bin-dee") So the next piece is Memcached, but the tools are not 'memcached'.Reply

That is something that went wrong in the final editing by Jarred. Sorry about that and I feel bad about dragging Jarred into this, but unfortunately that is what happened. As you can see further, "Facebook mostly uses memcached to alleviate database load", I was not under the impression that the "Open Source tools are Memcached. " :-)Reply

I was pretty sure it was a mistake and I only mentioned it to have the blemish removed - I've been following and admiring your technical writing since the the early 2000s. Please keep on bringing us great server architecture pieces. Don't worry about Jarred, he's fine too. We all make mistakes.Reply

I personally doubt that very much. The memcached servers are hardly CPU intensive, but a 32 bit ARM processor will not fit the bill. Even when ARM will get 64 bit, it is safe to say that x86 will offer much more DIMM slots. It remains to be seen how the ratio Watt/ RAM cache will be. Until 64 bit ARMs arrive with quite a few memory channels: no go IMHO.

And the processing intensive parts of the facebook architecture are going to be very slow on the ARMs.

The funny thing about the ARM presentations is that they assume that virtualization does not exist in the x86 world. A 24 thread x86 CPU with 128 GB can maybe run 30-60 VMs on it, lowering the cost to something like 5-10W per VM. A 5W ARM server is probably not even capable of running one of those machines at a decent speed. You'll be faced with serious management overhead to deal with 30x more servers (or worse!), high response times (single thread performance take a huge dive!) just to save a bit on the power bill.

As a general rule: if the Atom based servers have not made it to the shortlist yet, they sure are not going to replace it by ARM based ones.Reply

TBH we'd position SL class servers for this kind of scenario rather than DL380G7 (which does have a DC power option btw) so not sure it is a relevant comparison. Though I understand using what is available to test.Reply

This review claims this facebook server is more efficient than Hp's, but I see no prove. They only compares the power supply power factor performance. But what about efficiency? I guess the lab has no 277Vac input(which most datecenter don't have as well) and they can only power the server in 208/230Vac. As a result, they can't compare the servers efficiency. Also they didn't describe at what loading condition is the test being done on... I am sure the HP server has better efficiency than the facebook one at 230Vac input. The only good thing about the facebook one is that it might not need a UPS. But the consequence to that is, you have to use the battery rack from Facebook, which is not standard and can be costly.

Also it is nice to know that the Powerone power supply will overheat when using DC input for more than 10min....hahahahh...that's a smart way to cost down the power supply...Reply

Google doesn't use relational databases to store and retrieve its information either. Neither does the high performance data warehouse that was developed on a program I worked on a few years ago - we migrated away from Oracle for cost and performance reasons.

I think that the days of the Relational Database are numbered. The mainstay of the Relational Database (stored procedures) are quickly showing their age in a complete inability to debug issues with them outside of expensive specialized tools. We've been replacing them as much as we can with an abstraction layer.

But we still have goofy constructs to deal with (joins just don't make sense from a OO perspective).

RDBMS is numbered only for those who've no idea that what they think is "new" is just their grandpappy's olde COBOL crap. Just because you're so young, so inexperienced, and so stupid that you can't model data intelligently; doesn't mean you've got it right. But if you're in love with getting paid by LOC metrics, then Back to The Future is what you want. Remember, Facebook and Twitter and such are just toys; they ain't serious.Reply

I can see how the OpenCompute compares well to a DL380G7 in terms of performance vs power consumption and may compare well in price (those details aren't readily available), but the things that the OpenCompute has going for it are that it has been stripped of unneeded components, fitted with efficient fans and matched to efficient power supplies. From what I have seen and done in and around datacenters, these are exactly the objectives of a blade based system, where you can have large, efficient power supplies, large fans and missing or shared devices that are non-critical. I would like to see this article modified to include a comparison against a blade-based solution of equivalent specification to see how that stacks up - if you can swing it, use a fully populated blade chassis and average out the results against the number of blades. The blades also have an advantage of allowing approximately 14 servers in a 9 RU space - allowing approximately 70 servers per 45 RU rack, vs the 30 odd of the OpenCompute.

Whenever I need to put equipment into a datacenter, the important specifications are performance, cost price, power efficiency, size, weight and heat. Whenever a large number of servers are required, blades always stack up well, possibly with the exception of weight where there are limitations on floor-loading in a datacenter, but they do compare well with weight when compared to equivalent performing non-blade servers (such as 28 RU of DL380G7s).Reply

Although I think blades could be favorable if, at least if you take into account the infrastructure reduction such as networking ports. Thing is, if you look at the HP products that are available there are better alternatives.

HP, as the specific example, has a product call the SL6500. Its a second generation product specifically designed for these types of environements, and meant to compete with exactly the type of system that FaceBook created. A comparitive use case would be a 8 node configuration, which would take up 4U of rack space and could run off of 2-4 PSU that would be shared between the nodes. Additionally it has a shared redundant FAN configuration that uses larger, more efficient fans to cool the chassis. Its like blades, but doesn't have any shared networking, is made specifically to be lighter and cheaper, and has options for lower cost nodes.

The DL380 has a few things working against it in this comparison, from hot swapable drives, to enterprise class onboard management (iLO, not just basic BMC), reduandant fans, scalable power infrastructure, 6 PCI-E slots, onboard, high perfromance RAID controller, 4 NICs, and simplified servicability with single tool and/or toolless servicibility, and even a display for component failures.

The SL6500 would be able to have very basic nodes, with non hot swap SATA drives, basic SATA raid function, dual NICs, and features much more inline with the Facebook system. Sure, it woudln't be as specific to Facebooks needs, but would be a more interesting comparison as it would be at least comparing two systems designed for similar roles, not a general enterprise compute node to a purpose built scale out system, but a comparison of 2 scale out platforms.Reply

You have different cooling requirement also. Obviously Googles or Facebooks option isn't about the maximum density per rack. But they are also not using any traditional hot aisle cold aisle setup. Not will all datacenters be able to handle your 20-30kW rack. In terms off cooling requirements and power.Reply

I'm not sure how much of the benchmarks depend on network bandwidth, but Facebook certainly does a lot of it. Using SRIOV based NICs and supporting drivers allows the VM to access virtual NIC hardware directly, without having to go through the hypervisor. But, all NICs aren't built equal, many of them do not support SRIOV, and those that do, may not have drivers which support it in older kernels such as Centos 5.6. Unfortunately, since most Gigabit NICs were designed before SRIOV, most gigabit NICs don't support it. We have great difficulty getting hardware vendors to describe whether the provide SRIOV capable hardware or Linux drivers. The newer 10G NICs tend to support SRIOV, but whether the server needs more than 1G is unclear, and the 10G NICs are more expensive and use more power.Reply

Good comparison of the servers however I couldn't help but think how much better it would be if we ran actual workloads that facebook etc plan to run in the datacenter vs. these enterprise workloads. How about running MemcacheD / Hadoop / HipHop etc. which are the key workloads the OpenCompute servers are designed to run well.

Many of these workloads need large IO and memory vs. high compute. It will also be interesting to then use the same benchmarks to compare future servers based on technology from newbies like Calxeda, SeaMicro and AppliedMicro.

Xeon and Opterons based servers vs. ARM and Atom based servers. Now that battle of the old guard vs, the upstarts will be worth seeing.Reply

Johan,Thank you for excellent article. I love to read about cutting edge technology. Keep with the good work. But, I notice something that nobody in the comments has mention yet. In the last paragraph:

"... being inspired by open source software (think ..., ..., iOS, ...)."iOS is a Open Source Software?! When this happen?Reply

Since these systems are custom designed by Facebook engineers, I'm guessing you can't purchase anything like it, correct? Will that change with that foundation that Open Compute announced recently?Reply

Getting Power One to design a supply just right requires a LOT of testing. It's also strange to me that the supply only takes 200-277VAC. The Power One AC supplies I'm familiar with do 90VAC to 264VAC and pass 80PLUS Gold, maybe the tighter input range helps them tune it for more efficiency. Reply