Golden Member

The system clearly works for GPUs, which are even larger and more energy intensive. And bandwidth, with PCIe4 and eventually 5, will start to become sufficient even for CPUs. I could see the card system being shared by both GPUs and CPUs for server farms which are going for density.

If you had say, 64 lanes of PCIe per CPU, it could connect to the two nearest GPUs with 16 lanes each, and keep 32 lanes for RAM. These three components could be interleaved across the motherboard and stick to their neighbors for most communication.

Senior member

Which have RAM on board for a reason. While new PCIe versions multiply bandwidth, latency is usually still measured in 1000s of ns, a far cry of the sub 100ns common for accessing RAM (which is what CPUs need the many pins for).

Lifer

Which have RAM on board for a reason. While new PCIe versions multiply bandwidth, latency is usually still measured in 1000s of ns, a far cry of the sub 100ns common for accessing RAM (which is what CPUs need the many pins for).

Diamond Member

Which have RAM on board for a reason. While new PCIe versions multiply bandwidth, latency is usually still measured in 1000s of ns, a far cry of the sub 100ns common for accessing RAM (which is what CPUs need the many pins for).

Lifer

If done well it could be a really cool thing that solves a number of problems. In fact, with the right design, it could even be badass for enthusiasts, if you could get good birectional cooling going (eg; have the CPU die package have conductive shell that faces both directions, perhaps 16 cores on side 'A', 16 cores on side 'B'). HSF radiating in both directions.

Platinum Member

They already did a card based CPU in knights landing / xeon Phi ,they don't need to reinvent the wheel,if hell ever freezes over and this becomes attractive for the mainstream market they already have a working model.

Senior member

Chiplet design (however you want to call it) is the new cardridge design.

They used that design to pull cache off die, the limiting factor being the process technology. We are now facing the same problems with miniaturisation and the chiplet design is our new and better answer.

Senior member

Remember flip chips? Hook a socket based cpu onto a board you would then insert into the cartridge slot. Allowed faster CPUs in motherboards that otherwise wouldn’t support it....if my memory serves me right.

What's that got to do with anything? Windows can only see that many CPUs/cores, that won't change even with cartridge CPUs,if you go for -too many cores for windows to see- you will have to run some software layer for it.

Diamond Member

What's that got to do with anything? Windows can only see that many CPUs/cores, that won't change even with cartridge CPUs,if you go for -too many cores for windows to see- you will have to run some software layer for it.

My point is that these accelerator boards aren't extra cores that the OS can just schedule an arbitrary thread on- they're standalone devices that are managed by drivers, and run their own software internally. Whereas the old-school cartridge CPUs were just the same as a socketed CPU, they just happened to have a weird shaped socket.

EDIT: From the Mustang-200 product description:

With a dual-CPU processor and an independent operating environment on a single PCIe card (2.0, x4),

Diamond Member

CPUs are way too memory bandwidth heavy, you can't get enough pins on the edge of a card. Only way the CPU would move onto a "card" is if main memory moved onto it too.

Also bear in mind that I'm talking about mainstream, high performance CPUs. There's plenty of examples out there in embedded computing of "compute modules" that put the CPU and memory onto a daughterboard, which slots into a bigger board that is basically just an I/O expander. Take a look at COM Express, or Q7.