Posted
by
CmdrTaco
on Monday August 01, 2011 @11:00AM
from the yeah-but-how-many-cores-chris dept.

1sockchuck writes "How many servers is Google using? The company won't say, but a new report places the number at about 900,000. The estimate is based on data Google shared with researcher Jonathan Koomey, for a new report on data center power use. The data updates a 2007 report to Congress, and includes a surprise: data centers are using less energy than projected, largely due to the impact of the recession (buying fewer servers) and virtualization."

We've moved from 1U systems with 90-125W systems to blade enclosures with 60W CPUs and also getting 4 or 6 cores per physical CPU rather than 1 or 2. While our HPC cluster core count has increased by a factor of 4 (allowing researchers to do more work), the amount of energy and floor space required did not increase that much at all.

Depends how much numbers you have to crunch. I do not know Google architecture well enough to tell, but typically, x86 systems crumble under IO not lack of CPU cycles. Seti or distributed.net use cases are rather seldom.

uh, I believe we're talking about server situations, not consumers. Having a GPU on an ARM chip or on an X64/x86 chip is a nonsequitur. Or did I miss something here? I fail to see where you come up with this shit considering even Intel is trying to make an ARM chip. [tomshardware.com]. You think they're doing it because supposedly arm doesn't run as well or about having GPU's on the chip? Hint: Intel is shitting their pants over ARM right now.

Yeah. Oracle supports x86 and 64 bit SPARC. The only "news" I could find about OpenJDK for ARM is that support is due to be dropped from IcedTea [gbenson.net]. So all I know is that whatever "Java" exists for ARM is abandoned at this point. Anyone who knows otherwise, please chime in. Do note that a full featured Java should IMHO have both interpreter and JIT, and perhaps be in somewhat widespread use so that there'll be enough real-life test coverage. I wouldn't use it for any major ARM-based project at this point, unle

GPUs provide substantially faster floating point processing than a general purpose CPU. Putting these new Intel/AMD integrated chips in big iron supercomputers will give research teams orders of magnitude more computing power (for less power and money) than the current CPU-only based offerings.

As far as Intel trying to make an ARM chip, that's for an entirely different market.

putting an integrated graphics chip onto a server cpu is a joke. Even if you put 32 integrated graphics cards onto a single 8 core server cpu, the flops will be shit in comparison to any discreet graphics card which costs orders of magnitude less.

I should have explained myself better. You're thinking about small blade servers doing simple tasks. Sure, for that the GPU is a complete waste. But that's not what I'm talking about. I'm not talking about a computer that's cranking out killer graphics. I'm talking about computers that do "real work" that is seriously floating point intensive. Far beyond what you'd find running on a small server.

Get that small blade server out of your head. Think big iron. Think supercomputers. Think racks of cards

integrated GPU's don't really come anywhere near the flops of a discreet card (I don't get why people call it discreet instead of discrete, I always assumed that was the proper spelling for the term, anyway). Yes, it's better than the processor by itself, but no it is not substantial. GPU clusters already exist, remember the one from china? One single GPU can perform about 20x as much as a single xeon's GPU pretty easily. I understand what's needed for science research and understand supercomputer clusters

You really should read up on the new generation of integrated GPUs. They have come a long way in just the last year. AMD has integrated a full up ATI 6xxx GPU on die. Intel is making remarkable strides as well.

Having a discrete graphics card hanging on a PCIe bus may have been 20x faster last year. Integrated graphics were really inadequate when they were in the northbridge. Again, that was last year. Things have evolved significantly since then.

Is this a joke? I do read up on this continually, and have a 6970 at home and an E350 netbook. Not commenting on my work. It's not a "6xxx" GPU, it's equivalent to the lowest end of the 6 series gpu [wikipedia.org]. The highest end version is 1/10th that of the flagship graphics card and has only better performance than the lowest end of the flagship design from the last iteration. Also, that hasn't changed *since* at least the last two iterations (we're talking 4x series). I don't know what planet you're on but FLOPS do

Oh, and I forgot one more thing. The newest integrated CPU/GPU from AMD uses the same processor core that goes into the HD6xxx series chips coming from ATI. So, contrary to what you say, the FLOPS are not ". . . shit in comparison to any discreet [sic] graphics card . .."

GPU clusters and GPU's embedded into processors are not the same. Desktops might have iGPU's but server processors don't. Servers that do GPU clusters use discreet graphics cards, commonly. Why do people associate the two? Do people not really understand that on an 8 processor chip if you have a single integrated GPU (let alone 1 per core) its performance is still going to be shit in comparison to even a cheap discreet graphics card?

Hard to say. We were already moving to blade servers when we started the expansion. With the previous chassis servers with 90W CPUs, we'd have to get a rack rated at 30KW rather than the standard 20. With the low power CPUs, we can easily fit in a 20KW rack. Our data center folk (who really know the numbers) started to panic when we had a 20KW rack 1/2 full of 1U systems.

In addition to this, Google runs DC power supplies, with a low-voltage on board battery as opposed to large rack UPS. I've heard they have some innovative tricks for server room cooling as well, but I've never seen confirmation of exactly what they're doing. But Google goes to great lengths to cut down data center power usage.

There are reports that Google has been testing servers [semiaccurate.com] using low-power many-core servers from Tilera and Quanta. Facebook is also test-driving Tilera chips [datacenterknowledge.com] and seeing promising results when using them on key-value pair apps like memcached. When you have 900,000 servers, you get plenty of attention from processor and server vendors.

It's actually a shame that perfectly working machines are being destroyed this way... while in Asia, Africa etc. people (schools e.g.) would be more than happy to use those machines for 5 to 10 more years, at least.

Schools in the USA and Europe would be grateful for them too! Businesses often throw out machines as part of a 3-year rolling upgrade cycle, while schools are stuck with machines 5+ years old because they don't have budget for new ones.

sure, but what do citizens like you and me do about it? close to nothing I'm afraid.
Isn't there about a billion pcs operating world-wide today?
what do we do to recycle a few hundred million of them each year?

So, why is that, exactly? Google has proven by it's actions that their solution to the need for more processor power is just to add more servers. Granted, it'd be tough to pull one of the old blades and play the most recent edition of Duke Nukem, but really, these systems will still be able to crunch numbers for a very long time. Would buying faster systems with more cores would allow a single system to crunch more? Sure, but really, those old systems can still happily serve their original purpose. As Googl

Not exactly. When cost of not replacing hardware (in increased power consumption, rack space, cooling requirement etc) exceeds the cost of replacing hardware ; they replace. Even when the trashed hardware doesn't "actually break".

Not sure were you work, but in all places I've worked hardware was stockpiled once it was put out of service. I'm not so sure about the reason, but there seems to be a lot of accounting issues for a company if it wants to get rid of stuff. Maybe someone more knowledgeable can comment on that.

I was able to recycle a dell inspiron 8500 back in 2005. Just died on Friday, 6 years working well, the drive finally crashed. I figured I would fix it and give it to a non profit that might be able to use it.

A lot of non-profits won't take donated systems anymore because it's a nuisance to deal with so many antiquated systems. Non-profits need working and reasonably contemporary systems to do their work, a bunch of 256 meg Win 98 systems is really more of an insult than a benefit.

Well that will depend on a lot of things. If a socket upgrade is available they can just put in a new CPU. Even if the "server" is taken out of service you are just talking about the mainboard, CPU, and memory. The CPU and memory might be offered for sale as "used" or "refurbished". The board will be recycled, the power supply and rest will probably be reused unless a more efficient solution is available. Servers tend to have a longer life than say cell phones, desktops, and PCs.

I am quite sure that not all of it is what we call "physical" servers. It's most likely a cluster of beefy hardware running a ton of VM's. As that hardware becomes obsolete, engineers will run less VM's on it and later move it out of main production environment to handle less stressful tasks. It's common now, seeing several servers (48Cores, 512GB ram) running a few hundred virtual servers. So it will take a long time before that hardware will be completely thrown away...

I am quite sure that not all of it is what we call "physical" servers.

Google has a different model. Quila mentioned what their hardware is like; here is a slightly outdated Wikipedia article describing it [wikipedia.org]. Google would prefer to avoid the overhead associated with running virtual machines. 10-20% overhead may not be a lot for an organization with smaller computing needs, but with Google that would mean adding another 90,000-180,000 servers. Their computing needs are way beyond what any individual server can do anyway.

OT Google is using all kinds of renewable sources for their energy. [nexteraene...ources.com]
Back OT Do you think they keep all their servers in mobile homes so they can keep the number of servers a secret?

Seems like they always keep what's in their locations a secret. My father was a manager at a distribution center for a fairly large national electrical supply chain, and several times people would come in to buy things for a complex they were building nearby. Apparently they worked for Google (they were always wearing Google shirts) and they were never allowed to tell them what they were building or what kind of work they were going to be doing.

I'm guessing only front-facing web servers get constantly regular security patches. The rest might not get rebooted or patched at all if they replace the servers frequently enough (2-3 years). We are talking Linux servers here.

With that many servers, I'd tie the naming scheme to rack location. IP addressing would go in order along those racks.

I don't think that's that big of a problem, once you plan for having that many from the get go. All of those servers must be automatically provisioned, and their names are irrelevant and are machine generated. No one ever needs to know those names. Their management software probably manages servers by function. Say they have so many storage nodes, so many storage indexers, so many load balancers, so many static content servers, so many web spiders, etc. The configurations for any particular server must be generated, too, from some sort of a global configuration for their whole "system".

Virtualization is very inefficient compared to simply running multiple server processes on a single box, because each VM allocates resources to an instance of the OS, and RAM is more-or-less statically allocated beetween them. This makes sense when running several different services that each require a different operating environment, or to enforce complete user separation, e.g. a hosting service. But I would imagine google is running tens of thousands of identical servers running the same server daemon,

I would imagine google is running tens of thousands of identical servers running the same server daemon, so why would Virtualization make sense and save energy there?

Who said that Google uses virtualization to run identical servers?

Just running "git log --grep=virtualization" on the Linux kernel, you can see that Google does not contributed much to virtualization in the Linux lernel, in sharp contrast to other part of the kernel such as ext4.

... that's a square 1000 x 1000 meters. Now place 5 "normal" computers next to each other in two layers and you need about 100 000 square meters. Divide 100 000 by 8 data centers (Atlanta, North/South Carolinas, Chicago, California, Oregon, Taiwan, Ireland) = 12 500 square meters per data center. If every data center has 2 floors than you need a building like 80 x 80 x 5 meters. And you still have enough place for the guys with wheelbarrows:) Anyway a data center like this would be about a size of an indus