Google's custom-built servers currently use x86 processors from other companies.

Google has designed and built its own custom-made servers for years now, but those boxes still use components designed and built by other companies. That could one day change, however: citing the ever-nebulous "person with knowledge of the matter," Bloombergreports that Google is looking into designing its own ARM-based chips for its custom servers.

We've already seen consumer technology companies like Apple and Samsung become more vertically integrated in the last few years—Apple designs its own phones and tablets, the chips that go in them, and the architecture that goes into the chips, for example. Just as Apple's software benefits from tight integration with Apple's hardware, Google is reportedly eyeing chip design as a way to "better manage the interactions between hardware and software." However, this is in no way a done deal. The same source telling Bloomberg about the chips notes that Google's plans aren't set and are subject to change.

The Bloomberg report goes on to frame Google's reported chipmaking ambitions as a threat to Intel, though this theory doesn't necessarily hold water. Google, Facebook, and other big companies already build their own servers, and yet the wider server market carries on. In any case, the battle between ARM and x86 for the server room won't begin in earnest until at least late 2014; that's when the first chips based on the 64-bit Cortex A53 and A57 architectures are due to begin shipping.

Though I guess that could be on the short term until ARM catches up performance wise?

Edit: changed performance per watt to straight performance. I know x86 and POWER have better throughput, but if we bring power into it, it may depend on use case... Though everything in comparing architecture comes down to use case. So...

This is like saying that Apple is evaluating a bigger iPad. Of course they're evaluating it. Probably built something to simulate it. Doesn't mean it will ever happen, but it would be stupid not to look at what it would mean.

Google's massively parallel search workloads would probably benefit more from a huge bunch of low powered ARM processors than a few big Intel ones, even when running virtualization. The beauty of building your own ARM cores is that you can throw out the stuff you don't need and optimize for your workload.

Recently the BELT architecture was revealed, the first real innovation in CPU architecture in a long long time, and offering real benefits. Seems to me this architecture is very googly thing, and too good to pass by. Who knows, they may have secretly bought it.

Edit: well, thank you downvoters. Arstechnica is a really nice place for those who add some context to a news item. Note that the BELT presentation you'll find at youtube was held at Google. Stupid cows.

If they are doing it, might as well make a version for phones/tablets, maybe this will make the traditional ARM chipset vendor to straighten up their stuff and make better chipsets... One can only dream.

A custom SoC would make sense for Google as it'd allow of higher IO connectivity while dropping overall power consumption. However, I do not see this as the sole means for designing their own SoC's. Even with the millions of servers Google has, the cost of designing designing a custom SoC is in the millions as well. It would drive down the initial hardware cost by not having to purchase premium components (i.e. Xeons for ECC support) from Intel. However, Google's major server costs stem mainly from powering these devices 24/7/365. The power savings and initial cost savings would have to out weigh the large design costs. Remember, there are end consumers of Google server hardware, it is all internal.

The more likely reason Google would look into licensing ARM cores would stem from a desire to create their own SoC's for Android phones and tablets. This would allow the initial investment to be spread across multiple products, some of which would be end-user. I do not think Google would attempt to migrate Android to a hardware lock-in model (i.e. if you want Android, it'll run a Google designed SoC) due to the FTC coming down on them.

So I can see Google getting into the SoC game but only if they were to offer a chip that'd be publicly used.

'Just as Apple's software benefits from tight integration with Apple's hardware, Google is reportedly eyeing chip design as a way to "better manage the interactions between hardware and software." '

I've never seen this proved. I've just seen hand waving arguments that claim the custom Arm chips are key to the iphone success.

The problem is you don't know how much efficiency is due to hand crafting the software.

Chip manufacturing is such a PITA that I think the amount of hand crafting the chip is over-estimated. Remember, you don't patch hardware. It had to be right going out the door. Chip design/production is a totally different mindset than software development.

Is it any wonder that Intel is taking steps towards operating a merchant foundry business?

People always point to Intel's process supremacy as the ultimate reason why they won't be challenged in their traditional strongholds, but they forget that that process supremacy is paid for by the margins they command in their traditional strongholds. This worked for them when they could sell all the transistors they could make into those markets. These days, those markets are mature, and yet maintaining process supremacy requires investments that leave them with more and more transistors to sell.

Meanwhile, markets they failed to gain a major foothold in, like GPUs and mobile SoCs have been gobbling up fab capacity to the point that those fabless semi customers are willing to pay premiums for increasingly advanced process tech. Now, some of Intel's biggest customers in one of their core market segments are looking at tapping into that ecosystem, which, in the long-run, eats away at Intel's ability to maintain process supremacy, unless they start selling capacity to fabless semi companies.

It would be interesting to hear that Google is moving, because they are probably (if any company is) very well placed to encourage x86 vendors to be...accommodating...in terms of the margins that they are willing to accept on chip sales.

Hearing that ARM is cheap enough to give Intel's Xeon markup team a case of nervous shakes is one thing. Hearing that it's efficient enough that Intel can't cut a huge customer a good enough deal to keep them from going elsewhere for better flops/watt, that'd be real news.

Google's massively parallel search workloads would probably benefit more from a huge bunch of low powered ARM processors than a few big Intel ones, even when running virtualization.

That's not trivially true. tl;dr latency matters

Google's workload is highly parallel, but it's not parallel like a numerical simulation or something. The parallelism comes from an unlimited number of queries that don't depend on each other. The parallelism to complete an individual query often isn't that high, and can come from something like querying several memory caching clusters at once rather than compute parallelism. If the actual CPU behavior is largely single threaded and you slow down your threads, you can shoot yourself in the foot. User engagement suffers when you do that.

That's why we keep hearing about ARM servers but adoption never seems to be where it should be if throughput and power consumption were the only criteria.

If they are doing it, might as well make a version for phones/tablets, maybe this will make the traditional ARM chipset vendor to straighten up their stuff and make better chipsets... One can only dream.

'Just as Apple's software benefits from tight integration with Apple's hardware, Google is reportedly eyeing chip design as a way to "better manage the interactions between hardware and software." '

I've never seen this proved. I've just seen hand waving arguments that claim the custom Arm chips are key to the iphone success.

The problem is you don't know how much efficiency is due to hand crafting the software.

Chip manufacturing is such a PITA that I think the amount of hand crafting the chip is over-estimated. Remember, you don't patch hardware. It had to be right going out the door. Chip design/production is a totally different mindset than software development.

never heard of microcode/firmware ?

Do you mean custom core or custom soc ?Hand crafting the software to your customised soc is what gives the best results - you can tune it for what ever mix you want - performance, power use , run time etc

Recently the BELT architecture was revealed, the first real innovation in CPU architecture in a long long time, and offering real benefits. Seems to me this architecture is very googly thing, and too good to pass by. Who knows, they may have secretly bought it.

Edit: well, thank you downvoters. Arstechnica is a really nice place for those who add some context to a news item. Note that the BELT presentation you'll find at youtube was held at Google. Stupid cows.

It's a shame that people are voting you down. I think it's awesome that Santa Claus has designed a new microarchitecture.

Google's massively parallel search workloads would probably benefit more from a huge bunch of low powered ARM processors than a few big Intel ones, even when running virtualization.

That's not trivially true. tl;dr latency matters

Google's workload is highly parallel, but it's not parallel like a numerical simulation or something. The parallelism comes from an unlimited number of queries that don't depend on each other. The parallelism to complete an individual query often isn't that high, and can come from something like querying several memory caching clusters at once rather than compute parallelism. If the actual CPU behavior is largely single threaded and you slow down your threads, you can shoot yourself in the foot. User engagement suffers when you do that.

That's why we keep hearing about ARM servers but adoption never seems to be where it should be if throughput and power consumption were the only criteria.

You are correct with latency but what actually is the biggest latency bottleneck in your example? It isn't the single threaded CPU performance but rather reaching over the network to the memory caching clusters. This is where a custom SoC could really shine as Google would have control over IO. For example the Ethernet controller can be moved on die alongside a dedicated TCP acceleration hardware.

Similarly storage controllers can move on-die. Here is where some very interesting things can happen if they chose to go all out. Moving storage to SSD's is the first obvious step and that's regardless of what chip Google uses. The next step would be to optimize the CPU to storage link and with an SoC, nearly everything can be moved on-die. For consumers this would mean dropping SATA in favor of SATA Express but Google could go a step further and just incorporate the SSD controller directly on die for their own datacenter usage. Flash storage then moves to a DIMM-like form factor with multiple ONFI links. I fathom a DIMM like card could host 1 TB of usable flash per slot with 32 slots per SoC. Beyond this is tweaking the processing on the SoC to be storage focused. Imagine two ARM cores per on die SSD controller. Then finally the software side is tackled. One of the most unique things Google does internally has been to develop their own file system for search. I would explore the idea of seeing the flash storage as a if it were directly addressable memory. That would nearly remove the software barrier entirely.*

*I would still have one a thin layer that would need to account for write endurance and NAND provisioning.

Recently the BELT architecture was revealed, the first real innovation in CPU architecture in a long long time, and offering real benefits. Seems to me this architecture is very googly thing, and too good to pass by. Who knows, they may have secretly bought it.

Edit: well, thank you downvoters. Arstechnica is a really nice place for those who add some context to a news item. Note that the BELT presentation you'll find at youtube was held at Google. Stupid cows.

The problem with BELT isn't that it is not entirely new. The SPARC architecture has a sliding register window to accelerate function calls. The other factor about being really skeptical on performance is that is VLIW architecture which tends to favor highly parallel workloads. Scaling of VLIW for serial based workloads is not impressive as seen by the Itanium line.

Recently the BELT architecture was revealed, the first real innovation in CPU architecture in a long long time, and offering real benefits. Seems to me this architecture is very googly thing, and too good to pass by. Who knows, they may have secretly bought it.

Edit: well, thank you downvoters. Arstechnica is a really nice place for those who add some context to a news item. Note that the BELT presentation you'll find at youtube was held at Google. Stupid cows.

You're being down voted because(a) (politely) Raising new architectures is irrelevant to the discussion.

(b) (realistically) MILL is NEVER going to get built. It's a bunch of interesting ideas that are FAR FAR too different from the mainstream to ever attract the sort of real cash it takes to build a CPU. Look at Itanium for god's sake. You might not have learned anything from that debacle, but the sort of people who sign the billion dollar checks necessary to create a new CU HAVE learned from it.

MILL is fan fiction for people who care about CPUs. Some of its ideas MAY (after a long period of evaluation and modification to the "traditional" CPU world) eventually make it into a product, but not soon, and not the whole CPU.

'Just as Apple's software benefits from tight integration with Apple's hardware, Google is reportedly eyeing chip design as a way to "better manage the interactions between hardware and software." '

I've never seen this proved. I've just seen hand waving arguments that claim the custom Arm chips are key to the iphone success.

The problem is you don't know how much efficiency is due to hand crafting the software.

Chip manufacturing is such a PITA that I think the amount of hand crafting the chip is over-estimated. Remember, you don't patch hardware. It had to be right going out the door. Chip design/production is a totally different mindset than software development.

never heard of microcode/firmware ?

Do you mean custom core or custom soc ?Hand crafting the software to your customised soc is what gives the best results - you can tune it for what ever mix you want - performance, power use , run time etc

Or you just hand craft your software to a stock Arm chip.

Let's look at the great successes in proprietary CPUs: Sun Spark, MIPS, PA to name a few. (Hint: this is sarcasm.) Nextgen would have been a footnote in history if AMD didn't buy them.

In the mean time, your custom Arm chips needs a custom probe card, DUT board, test program, burn-in, yada yada yada. Everything you need to be in the chip business.

What we don't know is the extent that Arm is doing for Apple, etc. While Apple may claim to be on the chip business, it may be more like they cause chips to be made.

What we don't know is the extent that Arm is doing for Apple, etc. While Apple may claim to be on the chip business, it may be more like they cause chips to be made.

At this point, ARM is just handing out the architectural references for the ISA to Apple. The actual CPU core is of Apple's own design. This is one of the reasons why Apple was able to come out first with a 64 bit ARM chip.

And Apple is under contract to continue to provide PowerPC based SoC's from the PA-Semi acquisition. Military contracts were in place before hand that continue to be honored.

Recently the BELT architecture was revealed, the first real innovation in CPU architecture in a long long time, and offering real benefits. Seems to me this architecture is very googly thing, and too good to pass by. Who knows, they may have secretly bought it.

Edit: well, thank you downvoters. Arstechnica is a really nice place for those who add some context to a news item. Note that the BELT presentation you'll find at youtube was held at Google. Stupid cows.

It's a shame that people are voting you down. I think it's awesome that Santa Claus has designed a new microarchitecture.

It's not a new MICROarchitecture, it's a new ARCHITECTURE. The difference is important.A µArch is simply a new way of doing things that are already well understood. The languages, compilers, algorithms, all exist and are understood. An architecture doesn't just take on the risk of new circuits and ways of linking gates together, it takes on the risk of compiler, new algorithms, new libraries (eg OS primitives and math), worst case even a new language to really exploit the architecture.

As I said, we've already seen two massive failures on this front: Itanium and Cell, both of which had vast amounts of money behind them, but just couldn't achieve critical mass.

You CAN be successful if you can find an important enough niche where the improvement is good enough --- GPUs are the obvious case --- but servers seem highly unlikely to meet the criteria that the improvement available is so immense it's worth boiling the ocean for it. The best you seem able to do is precisely the sort of thing IBM has done with POWER --- add on specialized co-processors to handle well-defined tasks that are common enough, like decimal FP arithmetic, encryption, or some aspects of XML processing.

Edit: well, thank you downvoters. Arstechnica is a really nice place for those who add some context to a news item. Note that the BELT presentation you'll find at youtube was held at Google. Stupid cows.

What? You give original thought in your posts, thinking for yourself, and provide interesting links to cool new things? WELL THEN, have a bloody downvote from me, how dare you!</sarcasm>

Thanks for the links, new architecture stuff is really fascinating. And know that you are not alone noticing the weird downvoting in comment threads. It almost seems like people come for shallow jokes, and dislike anyone but the article's authors actually making them think. But then sometimes it's just fine. I've given up trying to predict it, or caring, I think for myself, try to offer substance, and anyone who objects can hate all the want, it won't stop me, and I know there are others who do sincerely appreciate the contribution.

... but Google could go a step further and just incorporate the SSD controller directly on die for their own datacenter usage. Flash storage then moves to a DIMM-like form factor with multiple ONFI links. I fathom a DIMM like card could host 1 TB of usable flash per slot with 32 slots per SoC. ...

I love it, and it should be immediately adopted, given that SSD's are pretty much standard, but cramming them through SATA and wires and into separate boxes is silly.

I keep waiting for memristor based ram. I want a computer where all storage is MRAM, running similar speeds to our current DDR3, but nonvolatile and measured in TB instead of GB. That would be a real next-gen machine, fuck all this drive and file-system crap, everything you have is always loaded, it's only a matter of paying attention to it.

Of course, the next step is processing memory, ie the memory executes code directly, any and all memory locations, in parallel. That's a far distant paradigm I think.

Storage = memory, that's the dream. There's no need to load stuff from storage if it's all directly accessible as memory. You would need the speed of DRAM and the power-off holding capacity as flash memory.

Recently the BELT architecture was revealed, the first real innovation in CPU architecture in a long long time, and offering real benefits. Seems to me this architecture is very googly thing, and too good to pass by. Who knows, they may have secretly bought it.

Edit: well, thank you downvoters. Arstechnica is a really nice place for those who add some context to a news item. Note that the BELT presentation you'll find at youtube was held at Google. Stupid cows.

You're being down voted because(a) (politely) Raising new architectures is irrelevant to the discussion.

(b) (realistically) MILL is NEVER going to get built. It's a bunch of interesting ideas that are FAR FAR too different from the mainstream to ever attract the sort of real cash it takes to build a CPU. Look at Itanium for god's sake. You might not have learned anything from that debacle, but the sort of people who sign the billion dollar checks necessary to create a new CU HAVE learned from it.

MILL is fan fiction for people who care about CPUs. Some of its ideas MAY (after a long period of evaluation and modification to the "traditional" CPU world) eventually make it into a product, but not soon, and not the whole CPU.

Google bought a company called Agnilux back in 2010. The company was founded by people who worked at PA Semi and then Apple, recall that PA Semi had developed a PowerPC CPU called PWRficient that was intended for embedded and server systems. Agnilux was rumoured to be developing a similar ARM based design for servers. Of course that doesn't mean that the project continued, Google might simply have wanted some talented engineers, but it does suggest that Google was pondering the issue of fully custom systems years ago.

You know maybe Google just need to contact Adapteva and buy into their company. They are making parallel computing possible for everyone. Their 16 core boards only cost $99 and they build a Beowulf cluster for about $4836 and used under 500 watts of power. That's pretty efficient if you ask me.

... but Google could go a step further and just incorporate the SSD controller directly on die for their own datacenter usage. Flash storage then moves to a DIMM-like form factor with multiple ONFI links. I fathom a DIMM like card could host 1 TB of usable flash per slot with 32 slots per SoC. ...

I love it, and it should be immediately adopted, given that SSD's are pretty much standard, but cramming them through SATA and wires and into separate boxes is silly.

There is a standard for this that has been proposed with JEDEC and a flash consortium. I just haven't see any product using it, probably because so many things are currently in flux with regards to SSD storage. NVMe on th server side has just been rolled out over the past year on PCI-e cards. A DIMM based format would be ideal with a NVMe controller placed on a motherboard or part of a chipset. Moving the SSD controller on to the CPU die only makes sense for big data sets that need to be repeatedly read (I would argue Google falls into this niche for my above example).

While it is inefficient to use SATA and separate boxes, there are support advantages on the consumer side. SATA Express and M.2 form factors do provide a nice benefit but are just hitting the market. Further theoretical speed increases are marginally beneficial on the consumer side.

I keep waiting for memristor based ram. I want a computer where all storage is MRAM, running similar speeds to our current DDR3, but nonvolatile and measured in TB instead of GB. That would be a real next-gen machine, fuck all this drive and file-system crap, everything you have is always loaded, it's only a matter of paying attention to it.

Current magnetoresistive RAM is great for its performance but its low density and high cost prohibit its usage in most scenarios. (Servers would favor higher density parts where as consumer products are very cost sensitive.) I can still see magnetoresistive RAM as a viable replacement for SRAM in CPU's but it'd be able to scale down to modern geometries (last I heard MRAM is built on 180 nm lines). The one current niche they'd be good at is for SSD write buffers.

Memresistors are are neat but no one has committed to using the technology in an end product. I'm not sure if/how current manufacturing lines would need to be adapted to utilize them. This puts the technology years away from usage. I do think it'll play a very significant role when it does reach the market as it'll be when traditional lithography hits a scaling limit.

Of course, the next step is processing memory, ie the memory executes code directly, any and all memory locations, in parallel. That's a far distant paradigm I think.

I generally agree but there are various nuances that need to be addressed. Using nonvolatile memory is necessary for capacity and performance reasons. The catch here is that current NAND has a distinct write endurance. It is possible to work around this by buffering writes so that they're performed in optimal chunks. The buffer would ideally be non-volatile for power loss scenarios (MRAM?) but a battery backed DRAM would suffice.

NAND cell failure would have to be worked around too. This prevents currently NAND from being addressed continually. There would need to be a MMU to virtualize a continuous address space as well as a translation buffer to piece non-linear regions together. These concepts are not new but this would be new application of these ideas.

DRAM and SRAM wouldn't go away in this scenario as they'd still be used for caching at various levels to speed up the entire system. Here is where some abstraction is still necessary. The data in these caches would need to be written back to storage memory at some point. Coherency is always an issue with systems this large too. Though these problems exist today and do have solutions (with varying trade offs).

The last aspect is the software side. Some research has been going into this of late (HP paper (PDF)). The real problem is that going to this storage model is fast if the file system is removed entirely but applications still have to play nice within it. What happens when an application has a memory leak? What do you scrub when an application crashes? At the very least, applications would need to be rewritten for model. For Google this is rather trivial as they're constantly refining their own software regularly but it would be a barrier for consumer adoption else where.

One last realization bout this is that once you can start addressing storage like you do memory, scaling becomes interesting. 64 bit address addressing provides for 16 exabyte of space. The catch with Google is that use a distributed system with massive amounts of replication. Addressing their entire cluster as continuous memory may feasibly exceed 16 exabytes a decade from now at Google's current growth rates. (Google's web index is only 100 petabytes today.) If Google were to create their own architecture for their data usage, going with 128 bit addressing may be forward thinking enough to be justifiable. As geeky awesome as a custom 128 bit architecture would be, I just don't see Google defining their fully custom ISA as there is currently no cost-benefit to this level of customization.

In the MBA world folks are taught to buy off-the-shelf things to bash together to make a completed product, b/c it saves money. (Someone else makes the sub-components, and you get a discount on buying in bulk).

But, when you reach the size of Google, you get greater benefit doing it all yourself.

Isn't that what Hyundai was doing? Everytime they needed to do something, they got into the industry of it. "We need ships!" They decided to make their own ships. "We need cars!" They make their own cars. Granted, they profit from this when they not only eat their own dogfood, but sell it to others. And, they can tightly integrate their whole industry using their own products.

Never, because they already sold the factory needed to produce the said object. And why take over a company where all you get will be hate mail for closing it down because it doesn't fit within your corporate structure.

Maybe this is one of those. We want a discount, and we have a money to make a serious threat to switch from x86 moves.

In the MBA world folks are taught to buy off-the-shelf things to bash together to make a completed product, b/c it saves money. (Someone else makes the sub-components, and you get a discount on buying in bulk).

But, when you reach the size of Google, you get greater benefit doing it all yourself.

Isn't that what Hyundai was doing? Everytime they needed to do something, they got into the industry of it. "We need ships!" They decided to make their own ships. "We need cars!" They make their own cars. Granted, they profit from this when they not only eat their own dogfood, but sell it to others. And, they can tightly integrate their whole industry using their own products.

I guess it makes sense if you're big enough.

Hyundai is a bad example though, because Hyundai was tasked with developing industrial capacity and provided with capital by the government of South Korea as part of a state project to industrialize the economy of South Korea.

In the 1960s no one in South Korea was capable of making high quality steel in industrial quantities or of building commercially-competitive ships. But the government of South Korea decided to do these things, acquired foreign investment by shipping large numbers of Koreans overseas to work as miners and nurses, and transferred the money to selected companies for the purpose of developing industrial capacity in these areas.

It didn't make any real business sense (even though it has paid off handsomely in the long runs), but was done as a political-economic gamble to develop the economy.

What we don't know is the extent that Arm is doing for Apple, etc. While Apple may claim to be on the chip business, it may be more like they cause chips to be made.

At this point, ARM is just handing out the architectural references for the ISA to Apple. The actual CPU core is of Apple's own design. This is one of the reasons why Apple was able to come out first with a 64 bit ARM chip.

And Apple is under contract to continue to provide PowerPC based SoC's from the PA-Semi acquisition. Military contracts were in place before hand that continue to be honored.

No, the Apple modifications are large enough to be considered a separate model instead of modification of specific ARM reference models, but they are still based on ARM reference designs. Heck consider Intel, even the latest 4th gen core is in some sense related to the Pentium Pro. You don't start over from scratch on CPU design. Not even in distorted reality land.

And ARM have had 64bit designs for sale for years, Apple were just the first one to order a design based on them manufactured (the 64bit designs have worse performance/watt profiles so most ARM users were not interested in them).

+ decrypt what's coming in+ process what's received+ encrypt what's going out+ transmit the encrypted data via an improved SSL-style encryption (not presently hacked by the NSA).+ maybe even express the encryption within the SoC itself+ control the veracity of the random-number-generator+ control the veracity of the RSA, ECC, AES, Diffie-Helmann (sp?) algorithms for asymmetric, symmetric, and perfect-forward-secrecy encryption+ be able to tell all international clients that they're data is secure and uncompromisable down to the SoC level

Andrew Cunningham / Andrew has a B.A. in Classics from Kenyon College and has over five years of experience in IT. His work has appeared on Charge Shot!!! and AnandTech, and he records a weekly book podcast called Overdue.