According to Intel, it does. Given that one can buy and install a 7970 and one cannot yet buy the Laughabee card, I would say that AMD's "theoretical" performance is a lot more practical for the time being. By the time this thing comes out nVidia and AMD will be rolling out second-gen 28nm parts.Reply

He does have a point, though. We're talking about the difference between the theoretical performance of a shipping product versus the vendor-reported performance of an unshipped product. It's a pretty silly comparison, really.Reply

This fetus named DigitalStuck is named so for a reason, 'cause he's still stuck in vagina after so may years. Sometimes he tries to talk but labia majora keeps his hole plugged most of the time, fortunately for all of us.Reply

So do you have any benchmarks to show that Intel Xeon Phi achieves 1 TFLOP actual DP performance. Until you have some real benchmarks its best not to comment. Radeon HD 7970 has been reviewed and has proved its compute performance in many benchmarks like LuxMark, SiSoft Sandra . These chips have been praised for their compute performance.

The Top 500 results score achieves 118 TFlop with 9800 cores. Making the big assumption that all of the performance was from a 50 core MIC card, that'd put performance per card at 602 Gflop double precision. At 64 cores per card, double precision performance would be 770 Gflop. Chances are that part of the result also used the SandyBridge CPU's, otherwise it would have made more sense to go with the quad core Xeons to make the power consumption figures look better. How much this would skew results would depend on the system configuration. Two Xeon E5-2670's per MIC card would have a bigger increase the performance per card rating than one Xeon E5-2670 for four MIC cards would.

There are a few factors that could raise those scores. As a prototype, clock speeds were likely conservative and there is also the possibility of turbo coming into play. Further more results for a single card and host will likely be higher due to the removal of network overhead.

Regardless, these results paint Xeon Phi as merely competitive instead of having a decisive performance edge over its GPU counter parts.Reply

If Intel is beating their chest over "theoretical MFLOPS" I would simply assume that is because they can only claim "machoflops" (haven't heard that term in forever) instead of actually pushing the doubles through.

One other issue is that if they are gunning for the top500 list, LuxMark and SiSoft don't matter, they need to use LINPAC. There may internal issues between using numbers that aren't based on LINPAC and aren't as high as the competition (using something else).

Also, pointing out the NVIDIA’s GK110-based Tesla K20 is pretty much a joke considering it runs 20% the fp power of the AMD and Intel systems mentioned (for single point DSP work it should be unstoppable, but don't expect it to be useful for much HPC work).

Finally, I wonder what it must be like to work at AMD or Nvidia and watch Intel casually launch a swing-for-the-fences product that challenges your bread and butter. They might have a 15 year history of complete fail (on these high end coprocessors), but it looks like the engineers/groups on the project change and you have to worry each time they try.Reply

Considering the rumors i have heard so far, Tesla K20 will be focused on double point calculation and carrying around 50-80% of the single floating point performance in double. Around 1.5 TFlops. But these are only rumors and as such i would avoid them till we see it in action.Reply

The chip details about gk110 aren't rumors any more - 15 SMX (with 192SP ALUs and 64DP ALUs) are confirmed. So 1/3 DP rate. The exact flops rate though isn't known since neither clock speed nor the actual active unit count (there's a good chance at least one SMX is always disabled) is known. But it should end up in the neighbourhood of 4 GFlops single / 1.3 GFlops double.Reply

Technically no, because Intel's only numbers are per-node which leaves a question of whether they're allowing the CPUs to contribute or not. If they are including the CPUs, then a single Xeon Phi gets around 700 GFlops in linpack(rmax.)

And yes, I consider a presentation by Intel using the industry standard benchmark to be a 'real benchmark'. Far more real than what the GPU companies typically throw around in their PR materials.Reply

Disregarding the issues of "real" vs. "theoretical" flops (which we don't really know enough about, if for instance intel has a 512bit memory interface that could indeed also give an advantage), this is only for DP flops. But I think SP flops shouldn't be completely neglected, and the 7970 (as well as the nvidia K10 though of course this one stinks with DP) very easily beats Knights Corner there.There's a lot more than raw DP flops though which counts so it may still be quite ok. It doesn't have any of the graphics "baggage" and the "many-core" approach is certainly a bit different.Reply

how much easier will it be? With all the advancement in gpu programming, and with Microsoft integrating C++ AMP (accelerated massive parallelism) into VS2012, Intel would have trouble selling these if that's their strongest argument.Reply

It is algorithm design that is easier, not tool usage or language features.

It depends on the problem. GPGPU is only good at data-parallel algorithms. If you don't have a lot of data that can be broken into many chunks that can each be processed independently, then it won't work well. Developing for GPUs is an ongoing attempt to eliminate branching, because branching can very easily stall the pipeline. In other words, it is often better to pre-calculate all possibilities in parallel, then choose the correct one in the end. It can quickly get complicated trying to remove if / then logic.

MIC, though, uses general purpose CPU cores that don't have the same issues with branching, yet has a 16-wide vector unit. While not nearly as wide as a GPU, it is still sort of the best of both worlds. The flexibility makes it easier to program. And, for some problems that are not so data-parallel, it makes it much easier.Reply

How is Intel kicking Nvidia while they're down? You're speaking as though Xeon Phi is already available, while the latest road maps indicate that Nvidia's Tesla K20 will be launching first. And I'm not sure if you've realized this, but the theoretical fp64 performance of a fully enabled gk110 should be quite a bit higher than 1 TFLOP, assuming reasonable clocks. gk110's DP performance can operate at 1/3 fp32, and gk104 is already capable of pushing 3 TFLOPs fp32 with 1536 cores. So even assuming the gk110 in K20 will be clocked significantly lower (which is pretty much a certainty), Nvidia should have absolutely no problem exceeding 1TFLOP theoretical fp64 performance. Real world performance is another story though. For that we'll just have to wait for benchmarks.

As for the HD7970, I'm not even sure how it's relevant. Pro's in the market for a Tesla or Xeon Phi won't even consider an HD7970 as an option. It has neither the industry nor the driver support to be a viable option in this area. However like Ryan said, given AMD's shift in focus with Southern Islands we may very well see a viable option based on GCN before the year is out.Reply

It really has nothing to do with the common Xeon platform known today.Granted the Xeon started way back around the Pentium II era and the MIC uses modified Pentium cores, but I find it a little sad that with their marketing budget they couldn't come up with a better name.

Atom, Core, XeonSeems like all they needed was a good 4 letter name that vaguely resembles something from a Science Textbook.Reply

Is anyone else wondering why Intel quit with the GPU project? Was it fear of more anti-trust litigation? From the comparisons I have seen, Intel is able to more than compete in the iGPU arena in terms of performance/die area, and I find it hard to believe Intel would experience fabrications woes on the order of what GloFo has gone through. Just wondering...Reply

The original Larrabee was based on mature 45nm process. The chip was having own problems. It wasn't performant enough, and there were doubts whether going into the resource intensive yet low margin video card market made sense.

Larrabee did software rendering for everything except the texture unit. Soon after they gave up graphics for Larrabee one thing they touted for Sandy Bridge's iGPU was that some form of fixed function hardware is necessary for good performance.Reply

It wasn't raw compute performance that held Larrabee back. That functionality worked form all accounts. What held the designs back was the software stack as a GPU. Intel simply couldn't create a drive that allowed the chip to perform at a competitive level. The chip was reportedly power hungry and used a 6 + 8 pin setup in the few public demos it was seen.

I do think it was wise to cancel Larrabee when they did as it would have also had the impact of forking the x86 ISA further. This time around, the vector extensions are themselves an extension of AVX instead of an incompatible competitor. While an optimized MIC piece of code may not run on Sandy bridge/Ivy bride, the AVX code written with those chip in mind will now work on the MIC card.Reply

IT won't abandon Quadro or tesla that fast. THey are really settled in for many years, even ATI with there Firepro brand which often provides a better price/performance ratio isn't able to gain much marketshare. Typical behaviour of human they go for a brand name.Reply

As an IT reseller I can confirm that. I recommend AMD FirePro to price-conscious users and almost invariably they prefer to buy lower-grade Quadro cards, even if they don't intend to use it for 3D rendering. The mind boggles.Reply

Nothing Intel has shown off yet would raise the ire of anti-trust regulators. Just give it a generation or two when PCI-e bandwidth becomes a compute bottleneck and latencies too high. The natural resolution would be to move the MIC chip from a PCI-e card to the motherboard which would connect to Intel Xeon chips over QPI. That is far more restrictive in terms of platform though ti may not enough for regulators to actually step in as it is the natural path for this technology to take.Reply

No. The natural plan is to move the MIC onto the same die with a few modern Xeon cores. At that point, no PICe card could compete, because the bandwidth would be too high. Also, it removes the memory-to-memory copies altogether, since the MIC cores and Xeon cores would have access to the same RAM. All cores would participate in the same shared memory ring bus, which is a 1024-bit (512-bit bidirectional) bus on the current MIC.Reply

Holding something "close to the chest" has been the phrase for the past few decades, about keeping details secret or private. Another variant is "playing my cards close to my chest", which of course comes from poker.

I have never heard nor seen "close to the vest" used. No idea where you're getting this.Reply

I'm very interested in any solution to speed up 3D renderings as many other CG artist, being the obvious ones, an 18 core dual Xeon machine with a top Quadro, a second dedicated Xeon machine or even a small render farm. I don't see Tesla listed in Nvidia's own page with high end 3D rendering software and I don't know why, but I guess it may be because it is a GPU architecture not fully supported for more complex software rendering engines like, say Mental Ray or Maya Software Render.

Could this Xeon-Phi be more transparent to current high end software-rendering? Will I may be able to install one or more of this PCIe cards into the same Xeon machine to speed up specifically a software rendering engine?

To me the advantage of this co-processor is that I will have more freedom to choose a very fast 6 core Xeon for multitask in 3D specific environment, move the co-processors to newer Xeon architectures (I am not sure) and not have to start all over with every new generation. I might end up with a more compact full tower Xeon system. Or maybe with a second dedicated more independent co-processor machine. Sorry if I am asking for to much...Reply

...how much is this going to cost? How much power is it going to use? Are the drivers going to be any good? I know these aren't answers anybody can provide right now, but I don't expect that something sporting 50+ "simple" x86 cores to be cheap or frugal. Still, you never know.

I'm not sure why you mentioned the 7970 though, being that it's a desktop card. The FirePro W9000 is really what we need to compare it to - 6GB GDDR5, 1GHz Tahiti XT card rated at, oddly, 1TFLOPs for double precision. The K20 is rated at an astounding 1.7TFLOPs so Intel and AMD had better have something ready to tackle that one.Reply