12-core die (used in Xeon E5, too) is most likely a 15-core. There are several hints that lead to this conclusion: same package size for 15-core E7 and 12-core E5 models and few config registers that even in E5 generations could address 15 cores.Reply

Does this mean it could be possible we have X99 chipset motherboards supporting this CPU later this year? I'm sick of 6 and 8 cores, it's not enough for me but i'm not spending tons of typical Xeon motherboards.Reply

Quite an impressive power user if you need a CPU designed for a heavy load server performing multitasking tasks such as handling hundreds or thousands of simultaneous user requests and/or managing dozens of VMs.

Everyone is sick of the 6 and 8 cores, or the 4 cores as mainstream. They seem to be in the market for an eternity. You can't exactly blame fully Intel for this as really multithreaded programs are still a scarcity.You can't exactly blame programmers as well because their compensation is ridiculous and their job inhumanly demanding.About this "power user" thing; it is dumb to self hallucinate that there is the average joe as opposed to "high performance" computing, the"big data" or "science apps". Remember that people in these fields readily salivate for gamer's gpus and be frank to recognize that supercomputing would have much less progress if it weren't for the poor average joe that you seem to scorn.The fact that 12-15 cores are available for " businesses" but not for the average joe is a travesty that stinks of decadence, waiting for the next paradigm of innovation.Reply

This is a bit of a chicken and egg problem, both for the hardware, and for the fact that the threading support in C++ is still relatively poor - it doesn't really have much in the way of features designed to help deal with the data-access control problem that is big issue in multithreaded programs (and C++ is still the language of choice for performance-critical apps, the likes of which would have a potential benefit from better usage of threading in the first place).Reply

I'm rather surprised if they go with the same socket 2011 as SandyBridge-E/Ivy Bridge-E. One of the hallmarks of the EX line has been its good socket scalablity by having 4 QPI links. This enables 8 socket systems with minimal hops between sockets. Similarly the EX has used memory buffers to enable more bandwidth and higher memory capacities than the EP line.

If the included picture hasn't been modified by Photoshop, then we're looking at a different physical key for the socket.Reply

Yes, it has 2011 pads on the package. Rather are they configured the same as SandyBridge-E/Ivy Bridge-E. In other words, does it have two QPI links, four DDR3 channels and 40 PCI-e lanes as the main IO features?

Also worth pointing out is that the rumors of Haswell-E have it using a 2011 pad package as well but supporting DDR4 memory. This variant has been tagged as 2011v3. (Presumably Ivy Bridge-EX would be 2011v2.)Reply

Actually looking at the block diagram, something is really weird. What is VSME and why are there seven of the links? Could these be a new serial memory bus technology? I'd fathom it'd be useful for a spare memory channel in a RAID5-like array (See the Alpha EV7). However, everything else points to two independent memory controllers which complicate such a function.

Further more, the block diagram indicates 3 QPI links which would allow this chip to scale to 8 sockets. The current Sandy Bridge-E/Ivy Bridge-E only go upto quad socket.

I wonder if Intel has some something rather crazy: enabled this die to be used in both the original LGA 2011 and a new EX variant. All the 12 core Ivy Bridge-E's were rumored to be using this die.Reply

I've been under the impression that Ivy Bridge-EX would still be using memory buffers, similar in concept to FB-DIMMs and the memory buffers used by Nehalem/Westmere-EX. Since much of the device signal portion is abstracted from the main die, using multiple memory technologies would be possible. I thought Ivy Bridge-EX's buffers would start off supporting DDR3 with DDR4 buffers appearing next year, possibly as a mid-cycle refresh before Haswell-EX appears. Reply

Max configuration is a UV 2000 with 256 CPUs from the XEON E5 series(max 2048 cores, 4096 threads), 64TB RAM in 4 racks. Note this is NOTa cluster, it's a single combined system using shared memory NUMA. Nodoubt SGI will adopt the E7 line with an arch update if necessary. The aimis to eventually scale to the mid tens of thousands of CPUs single-image(1/4 million cores or so).

SGI developed their own QPI glue logic to enable scaling beyond 4 socket with those chips.

It does have a massive global address space which is nice from a programming standpoint, though to get there SGI had to go through several weird hoops. Reading up on the interconnect documentation, it is a novel tiered structure. The 64 TB of RAM limit is imposed by the Xeon E5's physical address space. Adding another tier allows for non-coherent addressing memory space upto 8 PB. A 64 TB region does appear to be fully cache coherent in that node and the socket limit for a node is 256..

MPI clustering techniques are used to scale at some point and SGI's interconnect chips provide some MPI acceleration to reduce CPU overhead and increase throughput. Neat stuff.Reply

@mapesdhs,Please stop spreading this misconception. The SGI UV2000 server with 256 sockets IS a cluster. Yes, it runs single image Linux over all nodes, but it is still a cluster.

First of all, there are (at least) two kinds of scalability. Scale out, which is a cluster - you just add another node and you have a more powerful cluster. They are very large, with 100s of cpus or even 1000s of cpus. All supercomputers are of this type, and they run embarassingly parallell workloads, number crunching HPC stuff. Typically, they run a small loop on each cpu, very cache intensive, doing some calculation over and over again. These servers are all HPC clusters. SGI UV2000 is of this type. Latency to far off cpus are very bad.

Scale up - which is a single fat huge server. They weigh 1000kg or so, and have up to 32 sockets, or even 64 sockets. They dont run parallell workloads, no. Typically they run Enterprise workloads, such as large databases. These workloads are branch intensive, and jumps wildly in the code, the code will not fit into the cache. These are of the SMP server type, running SMP workloads. SMP servers are not a cluster, they are a single fat server. Sure, they can use NUMA techniques, etc - but the latency to another cpu is very low (because they only have 32/64 cpus which is not far away), so in effect they are like true SMP server. There are not many hops to reach another cpu. SGI is not of this SMP server type. Examples of this type are IBM P795 (32 sockets), Oracle M6-32, Fujitu M4-10s (64 sockets), HP Integrity (64 sockets). They all run Unix OS: Solaris, IBM AIX, HP-UX. They all costs many millions of USD. Very very expensive, if you want 32 socket servers. For isntance, the IBM P595 32 socket server used for the old TPC-C record, costed 35million USD. One single frigging server costed 35 million. With 32 sockets. They are VERY expneisve. A cluster is cheap, just add some pcs and a fast switch.

Sure there are clustered databases running on clusters, but it is not the same thing as a SMP server. A HPC cluster can not replace a SMP server, as HPC servers can not handle branch intensive code - the worst case latency is so bad that performance would grind to a halt if HPC clusters tried Enterprise workloads.

In the x86 area, the largest SMP servers are 8 sockets servers, for instance Oracle M4800. Which is just a x86 pc sporting eight of these Ivy Bridge-EX cpus. There are no 32 socket x86 servers, no 64 sockets. But there are 256 sockets and above (i.e. clusters). So there is a huge gap between 8 sockets, the next is 256 sockets (SGI UV2000). Anything larger than 64 sockets, is a cluster.

For instance, the ScaleMP Linux server sporting 8192/16384 cores and gobs of TB or RAM, very similar to this SGI UV2000 cluster, is also a cluster. It uses a software hypervisor, that tricks the Linux kernel into believing it runs on a SMP server, instead of a HPC cluster:http://www.theregister.co.uk/2011/09/20/scalemp_su..."...Since its founding in 2003, ScaleMP has tried a different approach. Instead of using special ASICs and interconnection protocols to lash together multiple server modes together into a SMP shared memory system, ScaleMP cooked up a special software hypervisor layer, called vSMP, that rides atop the x64 processors, memory controllers, and I/O controllers in multiple server nodes....vSMP takes multiple physical servers and – using InfiniBand as a backplane interconnect – makes them look like a giant virtual SMP server with a shared memory space. vSMP has its limits....The vSMP hypervisor that glues systems together is not for every workload, but on workloads where there is a lot of message passing between server nodes – financial modeling, supercomputing, data analytics, and similar parallel workloads. Shai Fultheim, the company's founder and chief executive officer, says ScaleMP has over 300 customers now. "We focused on HPC as the low-hanging fruit."

Even SGI confesses their large Linux Altix and UV2000 servers, are clusters:http://www.realworldtech.com/sgi-interview/6/"The success of Altix systems in the high performance computing market are a very positive sign for both Linux and Itanium. Clearly, the popularity of large processor count Altix systems dispels any notions of whether Linux is a scalable OS for scientific applications. Linux is quite popular for HPC and will continue to remain so in the future,...However, scientific applications (HPC) have very different operating characteristics from commercial applications (SMP). Typically, much of the work in scientific code is done inside loops, whereas commercial applications, such as database or ERP software are far more branch intensive. This makes the memory hierarchy more important, particularly the latency to main memory. Whether Linux can scale well with a SMP workload is an open question. However, there is no doubt that with each passing month, the scalability in such environments will improve. Unfortunately, SGI has no plans to move into this SMP market, at this point in time"

All large Linux servers with 1000s of cores, are all clusters - and they are all used for HPC number crunching workloads. None are used for SMP workloads. The largest Linux SMP server are 8 socket servers. Anything larger than that, are Linux clusters. So, Linux scales up to 8 sockets in SMP servers. And on HPC clusters, Linux scales well up to 1000 of sockets. On SMP servers, Linux does not scale well. People have tried to compile Linux to the big Unix servers, for instance "Big Tux" server, which is the HP Integrity 64 socket Unix server - with terrible results. The cpu utilization was 40% or so, which means every other cpu were idle - under full load. Linux limit is somewhere around 8 sockets, it does not scale further.

That is the reason Linux does not venture into Enterprise arena. Enterprise which is very lucrative, needs huge 32 socket SMP servers, to run huge databases. And they shell out millions of USD on a single 32 socket server. If Linux could venture into that arena, Linux would. But there are no such big Linux SMP servers on the market. If you know of any, please link. I have never seen a Linux SMP server beyond 8-sockets. The Big Tux server, is a HP-UX server, so it is not a Linux server. It is a Linux experiment with bad performance and results.

So, these large Linux servers - are all clusters which is evidenced by they all are running HPC workloads. None are running SMP workloads. Please post a link, if you know of a counter example (you will not find any counter examples, trust me).Reply

Here is a counter example:http://www.sgi.com/pdfs/4192.pdfIt describes ASIC used in the SGI UV2000 and how it links everything together. In particular, differentiates how it is different from a cluster. The main points are as follows:*Global memory space - every byte of memory is addressable directly from any CPU core.*Cache coherent for systems up to 64 TB*One instance of an operating system across the entire system without the need of a hypervisor (this is different from ScaleMP which has to have hypervisor running on each node) I also would not cite a SGI interview from 2004 regarding technology introduced in 2012. A lot has changed in 8 years.

Similarly the "Big Tux" experiment used older Itanium chips that still used a FSB. The have since gone to the same QPI bus as modern Xeons. Scaling to higher socket counts is better on the Itanium side as it has more QPI links. Of course this is a kinda moot point as all enterprise Linux distributions have dropped Itanium support years ago.

Oracle is working on adding SPARC support to it s Oracle Linux distribution. This would be another source for a large coherent system capable of running a single Linux image. No other enterprise Linux distribution will be officially supported on Oracle's hardware.Reply

Where is the counter example? I am asking for an example of a Linux server with more than 8 sockets that runs SMP workloads, namely Enterprise stuff, such as big databases. The SGI server you link to, is a cluster. It says so in your link: they talk alot about "MPI", which is a library for doing HPC calculations on clusters. MPI is never used on SMP servers, it would be catastrophic to develop Oracle or DB2 database, using clustered techniques such as MPI.http://en.wikipedia.org/wiki/Message_Passing_Inter..."MPI remains the dominant model used in High-Performance Computing today...MPI is not sanctioned by any major standards body; nevertheless, it has become a de facto standard for communication among processes that model a parallel program running on a distributed memory system. Actual distributed memory supercomputers such as computer clusters often run such programs."

So, there are no large SMP Linux servers. Sure, you can compile Linux to the IBM AIX P795 Unix server, but that would be nasty. The P795 is very very expensive 10s of millions of USD, and because Linux does not scale beyond 8 sockets on SMP servers, the performance would be bad too. It would be a bad idea to buy a very expensive Unix server, and install Linux instead.

Regarding the Oracle SPARC servers. Larry Ellison said officially when he bought Sun, that Linux is for low-end and Solaris for high end. Oracle is not offering any big Linux servers. All 32 socket servers are running Solaris.

Have you never thought of why the well researched and mature Unix vendors, have for decades stuck on 32/64 socket servers? They have have had 32 sockets Unix servers for decades, but not larger than that. Why not? Whereas the buggy Linux, has 8 socket servers or 10.000s of core servers, but nothing in between. There are no vendor manufacturing 32 socket Linux servers. You need to recompile Linux to 32 socket Unix servers with bad performance results. The answer is that Linux scales bad on SMP servers, 8 sockets being the maximum. And all larger Linux servers are all clusters, such as SGI UV2000 or the ScaleMP servers. Everybody wants to go into the Enterprise segment, which is very lucrative, but until someone will build 16 socket Linux servers, optimized for Linux, the Enterprise segment belongs to Unix and IBM Mainframes.

BTW, Oracle is developing a 96 socket SPARC server, designed to run huge databases, i.e. SMP workloads. You cant use MPI for Enterprise workloads, MPI is used for clustered HPC number crunching. Also, in 2015, Oracle will release an 16.384 threaded SPARC server with 64 TB of RAM. Both of them running Solaris of course.

You need to define precisely what an actual SMP is. I would argue that the main attributes are a global memory address space, cache coherency between all sockets, only one instance of an OS/hypervisor is necessary to run across all sockets.

Also you apparently didn't read your link to the Register very well. To quote it "This is no different than the NUMAlink 6 interconnect from Silicon Graphics...". Note that SGI UV2000 uses NUMALink6 and according to your reference is an SMP machine. So please on your definition include why the SPARC M6 would be an SMP machine even though the SGI UV 2000 would not.

As for MPI, it is useful on larger SMP due to its ability to take advantage of memory locality in a NUMA system. It is simply desirable to run a calculation on the a core that resides closest to where the the variables are stored in memory. It reduces the number of links data has to move over before it is processed, thus improving efficiency. This idea applies to both large scale NUMA where the links are intersocket as well as clusters where the links are high speed networking interfaces. Using MPI provides a common interface the programmer regardless if the code is running on a massively parallel SMP machine or a cluster made of hundreds of independent nodes.

As for the IBM p795, you don't have to do any of the compiling, IBM has precompiled Redhat and Suse binaries ready to go. That goes outside of the point though, regardless of price, it is a large SMP server that can run Linux in an enterprise environment with full support. It meets your criteria for something you said did not exist. As for your thoughts on Linux not scaling past 32 sockets for business applications, IBM does list world records for SPECjbb2005 using a p795 and Linux: http://www-03.ibm.com/systems/power/hardware/bench...Reply

"...So please on your definition include why the SPARC M6 would be an SMP machine even though the SGI UV 2000 would not..."

The definition of SMP is not in the architecture or how the server is built or which cpu it uses. The definition of SMP, is if it can be used for SMP workloads, simple as that. And as SGI and ScaleMP - both selling large linux servers with 10.000s of cores say in my links: these SGI and ScaleMP servers are only used for HPC, and never used for SMP workloads. Read my links.

Simple as that, they say it explicitly "not for SMP workloads, only for HPC workloads".

I dont care if a cluster can replace a SMP server running SMP workloads - then that cluster is good for SMP workloads. But fact is, no cluster can run SMP workloads, they can only run SMP workloads.

If you have a counterexample of SGI or ScaleMP running SMP workloads, please post them here. That would be the first time a cluster can replace a SMP server.Reply

That does not address the architectural similarities between the SGI UV2000 and the SPARC M6 for what defines a big SMP. Rather you're attempting to use the intentionally vague definition of running SMP by merely running SMP style software. I fully reject that definition as a single socket desktop computer with enough RAM can run that software with no issue. Sure, it'll be slower than these big multisocket machines and the results maybe questionable as it has no real RAS features but it would work. I also reject the idea that clusters cannot run what you define as SMP workloads - enterprise scale applications are designed to run on clusters for the simple reason of redundancy. For example large databases run in at least pairs to cover possible hardware failure and/or the need to service a machine (and depending on the DB, both instances can be active but it is unwise to beyond 50% capacity per machine). Further more, these clusters have a remote replication to another data center in case of a local catastrophe. That'd be three or more instances in a cluster.

Thus I stand by my definition of what an SMP machine is: global memory space, cache coherency across multiple sockets and only one OS/hypervisor necessary across the entire system.

My rationale is simple: a "cluster" by definition does not have the low latencyrequired to function as a shared memory, single combined system. The UV 2000does, hence it's not a cluster. I know people who write scalable code for 512+cores, and that's just on the older Origin systems which are not as fast. There'sa lot of effort going into increasing code scalability, especially since SGI intendsto increase the max cores over 250K.

If you want to regard the UV 2000 as a cluster, feel free, but it's not, becauseit functions in a manner which a conventional cluster simply can't: sharedmemory, low latency RAM, highly scalable I/O. Clusters use networkingtechnologies, Infiniband, etc., to pass data around; the UV can have a singleOS instance run the entire system. Its use of NUMALink6 to route dataaround the system isn't sufficient reason to call it a cluster, because NUMAisn't a networking tech.

Based on the scalability of target problems, one can partition UV systemsinto multiple portions which can communicate, but they still benefit fromthe high I/O available across the system.

It's not a cluster, and no amount of posting oodles of paragraphs willchange that fact.

Ian.

PS. Kevin, thanks for your followup comments! I think at the time this article wascurrent, I just couldn't be bothered to read Brutalizer's post. :DReply

Clicking through to the CPU-World page lists that CPU as not having a Turbo mode. Specifications are still unconfirmed at this point - as mentioned in the piece Intel often does a balancing act of cores/MHz and will never release a max-core model with max-frequency.

My original source for the information was a PCWorld article, until I was forwarded the Intel information direct. I have used information from CPU-World as well, who have used a different source.

I really expect that the E7-2800/4800/8800 v2 family (Ivy Bridge-EX) will have Turbo Boost. We just don't know what the specs will be from the various leaked sources. It is also supposed to have triple the memory density of Westmere-EX, plus PCI-E 3.0 support.Reply

Very frustrating that my company (architectural visualisation) could absolutely make use of these chips in our render-farm, yet our margins mean we will never be able to afford them. Intel's pricing for anything with more than 6 cores is just depressing.I guess we're just unfortunate to be in a no-mans-land market segment that gets use from multi-core CPUs but doesn't generate enough revenue to feast at the high table.Reply

Near as I can find, it's been years (or a decade+) since Cray built a machine with a Cray cpu; it's been AMD and Intel. Lots o chips in the cabinet. The interconnects have been Cray's special sauce.Reply

Yeah, possibly Nvidia Tesla gpu chips as well in their mix since crypto cracking needs plenty of fpu power. These 15core monsters with 4.5 billion transistors certainly are rather power efficient at 150w TDP. Reply

Cray - through its YarcData subsidiary - sells machines based on the custom ThreadStorm processor, which is a single-core 500MHz 3-issue VLIW design with 128-thread fine-grained multithreading, based on the 1990's Tera MTA machine. I assume that is what Ktracho was referring to.Reply

The Opterons 16 cores are cheaper and consume way less power, although they are not as powerful per core as the Intel part. But it seems Intel is responding to the Opteron 16 core release as AMD's pricing is much more reasonable for servers.But I am more concerned with scale upwards in terms of core count. One can see Intel NEEDS a huge L3 cache to keep their cores fed while AMD uses larger L2 cache exclusive to each processor and small L3 cache. When AMD uses HSA for server chips it would be interesting to see who they put as non-cpu compute cores/ Maybe a giant quad-pumped fpu unit cluster that does 4ops per cycle and crunches DP fp32 faster than ever before.Reply

Ivy Bridge EX is simply in a different league compared to anything AMD has to offer today.

So, no, Intel is not responding to AMD with Ivy Bridge EX, as AMD has close to zero market share in this segment and has nothing to offer.

Don't get me wrong, I'd very much love AMD to become competetive again, as Intel became a de-facto monopoly and basically segments the market with eFuses which control features worth four digit dollar numbers.

It is a sad state for an IT industry as a whole, but if we talk about performance and RAS features, Ivy Bridge EX has no competing product in AMD product offering - it is literally 2 or 3 generations ahead.Reply

It has turbo, and it goes to insane speeds (for a high-end server CPU), this is why 165W TDP is for :-) Article has a typo.

And, unfortunately, you won't be able to upgrade your Mac Pro with Ivy Bridge EX, as the socket is different (EX is using LGA 2011-1, EP used in Mac Pro is LGA 2011). If you need more CPU performance than a single Intel Xeon 2697 v2 has to offer you have following options: