Wouldn't it be interesting to program something with millions of cores, where you can write some bit of code that runs simultaneously on all cores, which are differentiated by their enumeration and/or random numbers? That's where you'd need a good PRNG.

I can see how that concept could be economical in some particular high performance application where the other option is a customer building their hardware with a whole wafers worth of packaged devices. It dispenses with the cutting and packaging. The issue would be heat. If you take as a crude estimate that a high performance CPU like we use in a desktop takes around 100 watts and that you can get lets say 500 per wafer, if the power used is similar, that something like 50,000 watts. (If the power consumption is really 50 watts or I can only get 200 per wafer, it honestly does not change the analysis,its still a huge amount of power) Lets say its going to be somewhere between 10KW and 100KW. Thats quite a bit of heat to get rid of. Its going to require some sort of chilled liquid cooling.

Heres an interesting thought experiment to do. I sort of imagine something about an inch thick, on a rack, so figure about 50 of these on an equipment rack. Thats going to be between 50KW and 500KW. Not including the pumps and chillers. Now consider a whole floor filled with those racks and their associated plumbing and power cables and pumps. Say 500 racks. Thats between 25MW and 250MW. Now consider a 10 story office building filled with these on every floor. Thats between 250MW and 2.5GW. Its interesting how, if you try to pack in a massive amount to computing power into a rather small space, you quickly become limited by the available power and your ability to dissipate heat. The office building sized super computer ends up needing its own dedicated power plant and probably, one or more massive cooling towers like they put on those power plants, just to dissipate the heat generated by the electronics. If you think about it, you can use a similar calculation to estimate the upper limit of the computing power that government entities like the NSA could possibly have at their disposal. Its hard to hide a 2500MW power station. "Uhhhh, that twin reactor nuclear power station over there, thats obviously running 24-7 at max output, and is quite obviously not delivering power to the power grid, but instead has power lines that go to just one building, one building that that produces enough heat that it requires its OWN cooling tower and that you can see the infrared glow from with a simple pair of IR goggles.... from the moon.....Thats not part of a government black project or anything"

However, the x86 chips are power behemoths compared to the ARM alternatives. Take an Apple A12 in an iPhone or iPad Pro. Probably similar processing power, but probably under 1W.

Now expand that to a very tuned cut down cpu in that giant wafer and the power is probably <<1W. True, there’s 400,000 of them tho. But there is no driver interface chips linking the cpus or memory, aside from whatever interface there is. A lot of power is chewed for the external interfaces.

Obviously they now need to mount that wafer onto something to cool it. Perhaps they’ll bond directly to an aluminium block that can have cooling tunnels inbuilt. After all, it’s easier to cool a big 215x215 mm Aluminium block than a lot of tiny blocks.

Given that an Intel i9-9980XE is rated at 165W for 18 cores, I think even 10W is an exaggeration for power per core from Intel for high-performance chips.
Also, these are general purpose processors, while the Cerebras WSE is AI-optimized.

The rule of thumb, given to me by a University professor in the field about ten years ago, was that for an n-core superscalar out-of-order execution CPU about 1/(2n+1) of the power consumed was in the I/O pads. This in part explains the rise of multi-core processors.

As Ray has said, this wafer scale chip eliminates most of the I/O pads (and drivers) required for something like this, so I'd think the power dissipation is going to be at least an order of magnitude less than your lower bound.

That said, a 1kW chip is still going to be an interesting exercise in power supply and cooling design.

Heres an interesting thought experiment to do. I sort of imagine something about an inch thick, on a rack, so figure about 50 of these on an equipment rack. Thats going to be between 50KW and 500KW. Not including the pumps and chillers. Now consider a whole floor filled with those racks and their associated plumbing and power cables and pumps. Say 500 racks. Thats between 25MW and 250MW. Now consider a 10 story office building filled with these on every floor. Thats between 250MW and 2.5GW. Its interesting how, if you try to pack in a massive amount to computing power into a rather small space, you quickly become limited by the available power and your ability to dissipate heat. The office building sized super computer ends up needing its own dedicated power plant and probably, one or more massive cooling towers like they put on those power plants, just to dissipate the heat generated by the electronics. If you think about it, you can use a similar calculation to estimate the upper limit of the computing power that government entities like the NSA could possibly have at their disposal. Its hard to hide a 2500MW power station. "Uhhhh, that twin reactor nuclear power station over there, thats obviously running 24-7 at max output, and is quite obviously not delivering power to the power grid, but instead has power lines that go to just one building, one building that that produces enough heat that it requires its OWN cooling tower and that you can see the infrared glow from with a simple pair of IR goggles.... from the moon.....Thats not part of a government black project or anything"

Standard 19" racks are measured in Rack Units (RUs) of 1.75". So at 1" per wafer unit, you could physically fit 3 wafer units in 2RU. Normal rack heights top out at about 42RU, So taking into account space for power and cooling distribution, 50 per rack is probably a fair upper bound.

At 1kW per wafer, that's 50kW per rack. If you are going with the "government black project" approach, at 250MW for the facility, you build it adjacent to the nuclear power plant with the power feed and facility underground, and you use the power plant cooling stacks to deal with the waste heat. Or, to go a little greener, you use the waste heat to drive a turbine to run the pumps to distribute the liquid coolant.

Otherwise you build it on the surface, connect it to the grid, and put some fake Google or Amazon signage on it. Problem solved!

There should be no problem in mounting 4 of these on a plane for rack mounts. Remember, a rack is 19" wide (less mounting) and the chip is ~6.5" square.

If the chips are mounted on an aluminium block with tubing connections for cooling (think air-conditioning style cooling) it probably would not be all that expensive. The problem with x86 chips are that they are so small compared to the heatsinks that attach to them.

Given that an Intel i9-9980XE is rated at 165W for 18 cores, I think even 10W is an exaggeration for power per core from Intel for high-performance chips.
Also, these are general purpose processors, while the Cerebras WSE is AI-optimized.

The rule of thumb, given to me by a University professor in the field about ten years ago, was that for an n-core superscalar out-of-order execution CPU about 1/(2n+1) of the power consumed was in the I/O pads. This in part explains the rise of multi-core processors.

As Ray has said, this wafer scale chip eliminates most of the I/O pads (and drivers) required for something like this, so I'd think the power dissipation is going to be at least an order of magnitude less than your lower bound.

That said, a 1kW chip is still going to be an interesting exercise in power supply and cooling design.

My estimate was not based off cores, it was based off (an admittadly crude) estimate of how many dies you can cut out of a wafer, thus estimating the power density per square mm of a high performance desktop or server part and then assuming you just didnt bother to cut them up. The picutures Ive seen of the predecessor to the 9980xe showed a die that was huge. A estimate from the photograph gives a die of a 25mmx25mm, or 625mm^2. So we have 165W/625mm or about 265mw/mm^2. There are about 70500mm^2 in a 300mm wafer. That gives 18.682KW. Now, it IS in fact true, that the power dissipation can be whatever you want it to be. If you want that thing to run on 100W, you can do that. But the entire idea of it is to make it fast. As fast as you can get it. And a even the core i9, and in fact ANY modern CPU, is thermally limited.

I have equipment here at work that consists of units about the size of two washing machines. Its filled with high speed digital and analog electronics for testing semiconductors. The power supply unit is in a rack that sits beside it the size of two large refrigerators. It is fed with 80amps at 440V. The test head (the thing the size of two washing machines) has boards, packed in with no space between them, with aluminum cooling jackets attached. They are cooled with a fluorocarbon coolant that is chilled to about -40C. (The chiller is actually itself cooled by a supply of chilled water at about 5C). I think that if you go to the trouble, and expense ,of making a processor on a entire wafer then your going to make something that looks a lot like that test head rather than something that looks like a desktop PC or a conventional server. And that CPU is very expensive,for example you get about 100 of those i9 cpus off a wafer, at 2k a piece, thats $200,000 dollars. If you do the same calculation with something with a die perhaps quarter or a fifth of that size, like the smaller processors, you end up with cpus that sell in the 200-500 dollars. I becomes clear that they charge for the die area,before they pile all the other variables on to sell to specific market segments. For the most basic system with a full wafer processor, figure 200-300k to make the processor, another 100k for the support equipment like the chiller and power supply and sell it for a million or two. But thats a whole other issue. My point was that the power consumption rapidly limits what you can do.

However, my estimate was based on a 300mm wafer. I think thats where it goes. If your going to go to that trouble go big or go home. But at 6.5" square, you get a about 7.2KW. Thats still going to take a lot of effort to get rid of, especially if you put several on each rack mounted unit.

Heres an interesting thought experiment to do. I sort of imagine something about an inch thick, on a rack, so figure about 50 of these on an equipment rack. Thats going to be between 50KW and 500KW. Not including the pumps and chillers. Now consider a whole floor filled with those racks and their associated plumbing and power cables and pumps. Say 500 racks. Thats between 25MW and 250MW. Now consider a 10 story office building filled with these on every floor. Thats between 250MW and 2.5GW. Its interesting how, if you try to pack in a massive amount to computing power into a rather small space, you quickly become limited by the available power and your ability to dissipate heat. The office building sized super computer ends up needing its own dedicated power plant and probably, one or more massive cooling towers like they put on those power plants, just to dissipate the heat generated by the electronics. If you think about it, you can use a similar calculation to estimate the upper limit of the computing power that government entities like the NSA could possibly have at their disposal. Its hard to hide a 2500MW power station. "Uhhhh, that twin reactor nuclear power station over there, thats obviously running 24-7 at max output, and is quite obviously not delivering power to the power grid, but instead has power lines that go to just one building, one building that that produces enough heat that it requires its OWN cooling tower and that you can see the infrared glow from with a simple pair of IR goggles.... from the moon.....Thats not part of a government black project or anything"

Standard 19" racks are measured in Rack Units (RUs) of 1.75". So at 1" per wafer unit, you could physically fit 3 wafer units in 2RU. Normal rack heights top out at about 42RU, So taking into account space for power and cooling distribution, 50 per rack is probably a fair upper bound.

At 1kW per wafer, that's 50kW per rack. If you are going with the "government black project" approach, at 250MW for the facility, you build it adjacent to the nuclear power plant with the power feed and facility underground, and you use the power plant cooling stacks to deal with the waste heat. Or, to go a little greener, you use the waste heat to drive a turbine to run the pumps to distribute the liquid coolant.

Otherwise you build it on the surface, connect it to the grid, and put some fake Google or Amazon signage on it. Problem solved!

So long as you can get Google or Amazon to go along that would certainly hide it. Even then however, you end up limited in how big you can go by the same constraints. You can build one or maybe two or three such facilities. So you can scale by small integer multiples, but you cant scale by orders of magnitude. 10 or 100 or 1000 of those facilities would still end up being impractical.

A wafer of chips only costs a few hundred dollars for 180nm in volume and this includes dicing into ~1000 dice. The cost to package the dice is comparatively larger.
So even if this chip cost $1000 ea, it’s not bad. Of course there is the R&D to recover, and profit too.
Have you seen Intels profit? Those $100’s chips make a lot of money for Intel. Compare those retail prices with ARM chips and you can easily see price gouging in play

Given that an Intel i9-9980XE is rated at 165W for 18 cores, I think even 10W is an exaggeration for power per core from Intel for high-performance chips.
Also, these are general purpose processors, while the Cerebras WSE is AI-optimized.

The rule of thumb, given to me by a University professor in the field about ten years ago, was that for an n-core superscalar out-of-order execution CPU about 1/(2n+1) of the power consumed was in the I/O pads. This in part explains the rise of multi-core processors.

As Ray has said, this wafer scale chip eliminates most of the I/O pads (and drivers) required for something like this, so I'd think the power dissipation is going to be at least an order of magnitude less than your lower bound.

That said, a 1kW chip is still going to be an interesting exercise in power supply and cooling design.

My estimate was not based off cores, it was based off (an admittadly crude) estimate of how many dies you can cut out of a wafer, thus estimating the power density per square mm of a high performance desktop or server part and then assuming you just didnt bother to cut them up. The picutures Ive seen of the predecessor to the 9980xe showed a die that was huge. A estimate from the photograph gives a die of a 25mmx25mm, or 625mm^2. So we have 165W/625mm or about 265mw/mm^2. There are about 70500mm^2 in a 300mm wafer. That gives 18.682KW. Now, it IS in fact true, that the power dissipation can be whatever you want it to be. If you want that thing to run on 100W, you can do that. But the entire idea of it is to make it fast. As fast as you can get it. And a even the core i9, and in fact ANY modern CPU, is thermally limited.

I have equipment here at work that consists of units about the size of two washing machines. Its filled with high speed digital and analog electronics for testing semiconductors. The power supply unit is in a rack that sits beside it the size of two large refrigerators. It is fed with 80amps at 440V. The test head (the thing the size of two washing machines) has boards, packed in with no space between them, with aluminum cooling jackets attached. They are cooled with a fluorocarbon coolant that is chilled to about -40C. (The chiller is actually itself cooled by a supply of chilled water at about 5C). I think that if you go to the trouble, and expense ,of making a processor on a entire wafer then your going to make something that looks a lot like that test head rather than something that looks like a desktop PC or a conventional server. And that CPU is very expensive,for example you get about 100 of those i9 cpus off a wafer, at 2k a piece, thats $200,000 dollars. If you do the same calculation with something with a die perhaps quarter or a fifth of that size, like the smaller processors, you end up with cpus that sell in the 200-500 dollars. I becomes clear that they charge for the die area,before they pile all the other variables on to sell to specific market segments. For the most basic system with a full wafer processor, figure 200-300k to make the processor, another 100k for the support equipment like the chiller and power supply and sell it for a million or two. But thats a whole other issue. My point was that the power consumption rapidly limits what you can do.

However, my estimate was based on a 300mm wafer. I think thats where it goes. If your going to go to that trouble go big or go home. But at 6.5" square, you get a about 7.2KW. Thats still going to take a lot of effort to get rid of, especially if you put several on each rack mounted unit.

I agree with your claim that the entire idea is to make it as fast as you can get it, but not with the implied method of getting there.
Much of the lost performance in a multi-chip system will be moving the data around between chips. Equally, faster clocks lead to greater heat which can limit processing speeds. The best way to get around these issues is to go parallel on the same die, which this WSE does in spades.

I disagree with your extrapolation of Intel chip power density to the WSE. As Ray has already said, ARM chips have lower power density, and there are other reasons to believe that the WSE can achieve much lower power density than Intel chips.

I bet the yield on a 25 x 25mm die is about 5%. Maybe there are only several good ones per wafer.

Its certainly not good. However, the fuses dont just disable features like a core i3 vs an i7, cache size,etc ,they can select redundancy circuitry to cover up defects and improve the yield considerably. Presumably the full wafer would use similar techniques, or else a single defect would ruin the whole thing.

Honestly, as interesting as the full wafer idea is, I think its time has passed before its even finished. The major manufacturers are going for chiplets. If wafer sized processors are to be, and they may be, I think they will end up as a wafer sized substrate carrying chiplets. That pretty much solves the yield issue. You have in essence a silicon circuit board with nanoscale features. Perhaps sapphire or someday even diamond might be used instead, since its just there to support conductors and conduct heat. (or even advanced sorts of ideas like a thin layer of diamond dielectric over layer of graphine. I wish I knew how to make THAT one) A big advantage is that the individual processing cores can be made with an advanced process optimized for the part being made, and the interposer can be built with a more robust process capable of being made with few defects and thus in higher yeilds and lower costs.

Given that an Intel i9-9980XE is rated at 165W for 18 cores, I think even 10W is an exaggeration for power per core from Intel for high-performance chips.
Also, these are general purpose processors, while the Cerebras WSE is AI-optimized.

The rule of thumb, given to me by a University professor in the field about ten years ago, was that for an n-core superscalar out-of-order execution CPU about 1/(2n+1) of the power consumed was in the I/O pads. This in part explains the rise of multi-core processors.

As Ray has said, this wafer scale chip eliminates most of the I/O pads (and drivers) required for something like this, so I'd think the power dissipation is going to be at least an order of magnitude less than your lower bound.

That said, a 1kW chip is still going to be an interesting exercise in power supply and cooling design.

My estimate was not based off cores, it was based off (an admittadly crude) estimate of how many dies you can cut out of a wafer, thus estimating the power density per square mm of a high performance desktop or server part and then assuming you just didnt bother to cut them up. The picutures Ive seen of the predecessor to the 9980xe showed a die that was huge. A estimate from the photograph gives a die of a 25mmx25mm, or 625mm^2. So we have 165W/625mm or about 265mw/mm^2. There are about 70500mm^2 in a 300mm wafer. That gives 18.682KW. Now, it IS in fact true, that the power dissipation can be whatever you want it to be. If you want that thing to run on 100W, you can do that. But the entire idea of it is to make it fast. As fast as you can get it. And a even the core i9, and in fact ANY modern CPU, is thermally limited.

I have equipment here at work that consists of units about the size of two washing machines. Its filled with high speed digital and analog electronics for testing semiconductors. The power supply unit is in a rack that sits beside it the size of two large refrigerators. It is fed with 80amps at 440V. The test head (the thing the size of two washing machines) has boards, packed in with no space between them, with aluminum cooling jackets attached. They are cooled with a fluorocarbon coolant that is chilled to about -40C. (The chiller is actually itself cooled by a supply of chilled water at about 5C). I think that if you go to the trouble, and expense ,of making a processor on a entire wafer then your going to make something that looks a lot like that test head rather than something that looks like a desktop PC or a conventional server. And that CPU is very expensive,for example you get about 100 of those i9 cpus off a wafer, at 2k a piece, thats $200,000 dollars. If you do the same calculation with something with a die perhaps quarter or a fifth of that size, like the smaller processors, you end up with cpus that sell in the 200-500 dollars. I becomes clear that they charge for the die area,before they pile all the other variables on to sell to specific market segments. For the most basic system with a full wafer processor, figure 200-300k to make the processor, another 100k for the support equipment like the chiller and power supply and sell it for a million or two. But thats a whole other issue. My point was that the power consumption rapidly limits what you can do.

However, my estimate was based on a 300mm wafer. I think thats where it goes. If your going to go to that trouble go big or go home. But at 6.5" square, you get a about 7.2KW. Thats still going to take a lot of effort to get rid of, especially if you put several on each rack mounted unit.

I agree with your claim that the entire idea is to make it as fast as you can get it, but not with the implied method of getting there.
Much of the lost performance in a multi-chip system will be moving the data around between chips. Equally, faster clocks lead to greater heat which can limit processing speeds. The best way to get around these issues is to go parallel on the same die, which this WSE does in spades.

I disagree with your extrapolation of Intel chip power density to the WSE. As Ray has already said, ARM chips have lower power density, and there are other reasons to believe that the WSE can achieve much lower power density than Intel chips.

Arm chips are RUN at a lower power density. There are 48 core arm chips that use the same power as an intel chip. Power density is a choice. You optimize for your application. Just look at what you can do with a normal desktop CPU and liquid nitrogen cooling. But its impractical. If your going to make a full wafer part, and then limit it to only a 1000W, Im skeptical that it will be worth your while. You go to all that expense, not just of making them, but designing them, have terrible yield (although there are ways to vastly improve that), and what do you get for it that you could not get with 100 or 1000 single die processors wired together. Any way you cut it, your power limited. Do the same calculation, except assume arm processors cores like they use in all the mobile gadgets. So you go with "high performance" core ,where performance is considered to be computation/watt. Assume some amount of board space required per core (or multicore die in a BGA) or however you want to do it. Then load them into a rack as tight as you can get them. The power requirement is still enormous. And the limit on your computational speed still ends up being, how much power can you supply to that small space.

In my previous post, I assumed that you consumed the same power per unit area as a core i9 desktop part. If you WANTED, you could limit that power to 700 watts instead of 7KW. Or you could go the other way and probably use 15 or 20KW. If you could dissipate the heat. Admittedly its a matter of where the "sweet spot". And maybe you get the power down to 1kw. But thats less than an order of magnitude. You can tweak things and scale by factors of 2 or 5 or maybe even 20, but you STILL start hitting hard limits. (tweak may even include throwing the whole thing into a land fill and rebuilding it from scratch with what you learned building it the first time) The processes that are available can only do so much. Find the sweet spot of your design, where you get the best computation per watt and that pretty much determines what you can do, even if you really push the limit with a 10 story office building powered by its own nuclear reactor.