Posted
by
Zonk
on Tuesday September 26, 2006 @04:16PM
from the thinking-of-the-gaming dept.

ZonkerWilliam writes "Intel has developed an 80 core processor with claims 'that can perform a trillion floating point operations per second.'" From the article: "CEO Paul Otellini held up a silicon wafer with the prototype chips before several thousand attendees at the Intel Developer Forum here on Tuesday. The chips are capable of exchanging data at a terabyte a second, Otellini said during a keynote speech. The company hopes to have these chips ready for commercial production within a five-year window."

On the other hand, the Osborne II never ran the risk of spontaneously inducing nuclear fusion in ambient atmosphere. (I don't even wanna imagine the heat output of 80 cores, even with the relentless march of technology.)

I'm gonna wait until 2020 when they finally merge them all back into one fast core.

They'll call it a "Bose-Wintel condensate". Instructions will be sent to the single core, but there will be no way of distinguishing which of the merged cell cores the instructions was run on. This will play havoc with floating point precision, but as Intel commented, "most users don't need that kind of precision anyway".

The condensated core will also be subject to the laws of quantum mechanics in that, before a program has finished running, there will be no way to know if it will crash or not. Microsoft plans to leverage this to further stablise their latest version of Windows. Security experts worried about the onboard "Quantum-Threading" technology redirecting portions of thread output randomly to other threads, were dismissed as not being "forward looking".

Meanwhile, AMDs new 1W, 128 core, 4098bit chip with 1GB L2 cache retails for almost 50% higher than Intel's Bose-Wintel chips, and has seen sluggish sales since the arrival of the new technology, despite its lower running cost that the 5MW Intel chip. When asked for comment, AMD's spokesman added; "Ch@#&t!! What the f**k is wrong with you people!??! Our chips save you money!! F@#*&^g cheapskates!!!"

Upon hearing the news, Linux founder and lead developer Linus Torvalds(51) said: "We're not rewriting the kernel for that monstrosity." Intel representative declared that the company was "dissapointed" in Torvald's remarks. Apple cofounder Steve Jobs(65), when asked whether Apple intended to release a the new Mac based on the chipset, declined to comment as he went about his daily 5km morning run. Apple pundits widely believe that the new Mac will run on a quad core Bose-Wintel Condensate, and to complement this will sport a blazing white, ultra smooth case made out of Bose-Einstien condensate, the fifth phase of matter.

In a related story, Microsoft cofounder Bill Gates(65), assaulted a technology reporter at a company press conference disccusing the new chip. Details are sketchy, but reports mention that one of Mr Gates older quotes about appropriate amounts of computer memory was brought up. Remond police have declined to comment on the case.

Exchanging data (data transfer) is not the same thing as operations per second. The post seems to either be confusing the two or stating that the chip does both. I guess I need to go read the article now and find out...

"In order to move data in between individual cores and into memory, the company plans to use an on-chip interconnect fabric and stacked SRAM (static RAM) chips attached directly to the bottom of the chip, he said."

Clarifiation from TFA:"But the ultimate goal, as envisioned by Intel's terascale research prototype, is to enable a trillion floating-point operations per second--a teraflop--on a single chip."

Further clarification from TFA:"Connecting chips directly to each other through tiny wires is called Through Silicon Vias, which Intel discussed in 2005. TSV will give the chip an aggregate memory bandwidth of 1 terabyte per second."

The first thing I thought when I saw this was that they really ought to dial in Quad Core before boasting twenty times that.
Apparently, AMD will be peddling them withtin the next year.

From TFA:

As expected, Intel announced plans to have quad-core processors ready for its customers in November. An extremely fast Core 2 Extreme processor with four cores will be released then, and the newly named Core 2 Quad processor for mainstream desktops will follow in the first quarter of next year, Otellini said.

Imagine the pain of having to write a functional applications with so many cores. I hope the interconnect will be very very fast. Otherwise writing massively scalable parallel algorithms will be masssively painful. And with so many cores, one will need multiple independants memory banks with some kind of NUMA. And writing apps for those things isn't fun. You have to spend so much time caring about the parallel stuff instead of caring about the problem.

Not really, as long as you pick the right tools for the job. Writing code for such a machine using a threaded model would obviously be stupid. Writing it in a asynchronous CSP-based language like Erlang is much easier. There's a language I saw a presentation from some guys at IBM on that looks potentially even more promising, although I can't recall its name at the moment.

As with anything else in the last 10 years, if you try to pretend you're still writing code for a PDP-11, you'll have problems. If y

I seriously hope that power consumption and heat disipation are really attacked before these things come out. Can you imagine needing a 200-amp service and liquid nitrogen cooling for something like that right now?

This is hilarious, because if this goes out on the market there's not going to be many operating systems capable of scheduling on that many chips usefully. OS X can't do it, Windows can't do it, and nor can BSD. But Linux has been scheduling on systems with up to 1,024 processors already:)

Wow, good point. I bet Intel never once stopped to think about THAT.I sincerely doubt this will make it anywhere near Fry's or CompUSA, assuming it launches in +5 years. Most likely academic, corporate (think of the old days and mainframe number crunchers on wallstreet), and scientific.

Simply cheap teraflops for custom applications.

Of course, everyone thought it was a great idea when Cell announced they could do 64 or more cores. But since this is/. versus Intel, everything has to be a joke, right?

Scheduling isn't a one size fits all process. What works at 4 cores doesn't work at 40 and so on. As for other operating systems, FreeBSD has been working quite actively on getting Niagras working well with their sparc64 port. I've been saying it didn't make sense until this announcement. I figured we'd have no more than 8 cores in 5 years. We'll see what really happens.

The BSD projects, Apple and Microsoft have five years. Microsoft announced awhile back they want to work on supercomputing versions of windows. Perhaps they will have something by then. Apple and Intel are bed partners now. I'm sure intel will help them.

What this announcement really means is that computer programmers must learn how to break up problems more effectively to take advantage of threading. Computer science programs need to start teaching this shit. A quick you can do it, go get a master's degree to learn more isn't going to cut it anymore. There's no going back now.

Intel's prototype uses 80 floating-point cores, each running at 3.16GHz, said Justin Rattner, Intel's chief technology officer, in a speech following Otellini's address. In order to move data in between individual cores and into memory, the company plans to use an on-chip interconnect fabric and stacked SRAM (static RAM) chips attached directly to the bottom of the chip, he said.

So think more like Cell with 80 SPEs. Great for lots of vector processing.

With the heavily threaded nature of BeOS, even demanding apps would really fly on the quad+ core cpus that are preparing to take over the world.

Not that you couldn't do threading right in Windows, OS X, or Linux. But BeOS made it practically mandatory: each window was a new thread, as well as an application-level thread. Plus any others you wanted to create. So to make a crappy application that locks up when it is trying to do something (like update the state of 500+ nodes in a list; ARD3 I'm looking at you) actually took skill and dedication. The default state tended to be applications that wouldn't lockup while they worked, which is really nice.

Today, a 2 CPU x 2core computer can actually be slower than a 2x1 or 1x2 core for certain "cherry picked to be hard" operations due to the OS making incorrect assumptions about things like shared/unshared cache - 2 cores on the same chip may share cache, two on different chips may not - and other issues related to the fact that not all cores are equal as seen by the other cores.In an 80-core environment, there will likely be inequalities due to chip-real-estate issues and other considerations. The question

why is that unfortunate? software and hardware have always run at pretty much the same pace, but I would rather have an 80 core processor which I can keep for 10 years and update my OS to take advantage of more of the cores as time goes by than have to buy a whole new system every 3 years at least.

Really if your read the story it is 80 floating point cores! It would be be ideal for many graphics, simulation, and general DSP jobs.What it isn't is 80 CPU cores.Really interesting research but not likely to show up in your PC anytime soon.With all these multi core chips I am waiting for more specialized cores to start being included on the die.After all a standard Notebook will have a CPU, GPU and usually a DSP for WiFI. Seems like linking them all with something like AMDs Hyper transport could offer so

... not for the high price Gasse wanted for it, but for what 3COM got it for. They need that pervasive multi-threading now more than ever. NEXT was good and all, but are they really going to be able to backwardly refine the whole bit? Oh well, at least they've got plenty of old BeOS employees. The pervasive beach-balls however make me wonder what they're doing all day, new kernel?

not 80 general purpose integer cores. They're essentially copying the Cell design with large numbers of DSPs each of which has a local store RAM burned onto the main chip. Is this a good idea? Guess we'll find out with the Cell. What interests me most about this announcement is not the computing potential from such a strategy, but that it's an obvious response to IBM and Sony technology.

A couple of things to mention here. Many years ago I read an Intel road map for the x86 processors. It was more than 10 years ago, less than 20 I think. In it they said they would have massively multicore processors coming along around now. They may have forgotten that and reinvented the goal along the way, companies do that. But, they really have been predicting this for a very long time.

The other thing is that with that many cores and all the SIMD and graphics instructions that are built into current processors it looks to me like the obvious reason to have 80 cores is to get rid of graphics coprocessors. You do no need a GPU and a bunch of shaders if you can throw 60 processors at the job. You do need a really good bus, but hey, not much of a problem compared to getting 80 cores working on one chip.

With that kind of computer power you can throw a core at any thing you currently use a special chip for. You can get rid of sound cards, network cards, graphics cards... all you need is lots of cores, lots of RAM, a fast interconnect, and some interface logic. Everything else is just a waste of silicon.

History has shown that general purpose processing always wins in the end.

I was talking to some folks about this just last Saturday. They didn't beleive me. I don't expect y'all to believe me either.:-) The counter example everyone came up with was, "well, if that is true why would AMD buy ATI?" The answer to that is simple, they want their patent portfolio and their name. In the short term it even makes sense to put a GPU and some shaders on a chip along with a few cores. At the point you can put 16 or so cores on a chip you won't have much use for a GPU.

I remember doing a project in college where we had to implement a 8 point FFT in software and hardware. I was eye-opening. The hardware implementation ran on a FPGA that had something like a 23Mhz clock. The software solution was a C program running on a 2Ghz desktop. 23 Mhz vs. 2 Ghz. The hardware solution was more than 10X faster.

I don't think that general purpose processors will ever completely replace special purpose hardware. There is simply too much to be gained by implementing certain features directly on the chip.

This is the last 3 years of Intel, all over again. Only now the megahertz race is replaced with the multi-core race.

Intel will create the "CoreScale" technology and make 4, then 8, then 16 cores and up while their competitors are increasing operations per clock cycle per watt per core. Consumers won't know any better, so they will buy the Intel 64-core processor that runs hotter and slower than the cheaper clone chip that has only 8 cores. Then when Intel starts runs up against a wall and gets their butt-kicked they will revert to the original Core 2 Duo design and start competing again.

Oh, and I predict that AMD will release a new rating called the "core plus rating" so their CPUs will be an Athlon Core 50+ meaning it has the equivalent of 50 cores. Queue n00bs who realize they have only 8 cores and complain.

And to think I didn't like history in school. Maybe I just hadn't seen enough of it to understand.

Software hasn't really improved for maaany years now, Spreadsheets and Word Processors are more colourful, higher resolution. But are these products smarter, better at all? Would a postgraduate write a better doctoral thesis with Office 2007 than with - say - Word 6.0? Is image manipulation thaat much better with the latest photoshop than with PS 5.5? With some minor exceptions the answer is clearly no.

- We were promised Virtual Reality with VR Helmets more than 10 years ago - is this _just_ a matter of hardware?- Smart voice recognition? Anyone tried it lately? Anyone tried to write pretty standard letters with it? Desastrous.- Intelligent assistents, understanding the user's needs? Operating system/application wizards that improve it's capabilties while you're working with 'em?

The applications are missing, they're faster, more colourful, higher resolution, antialiased... but still DUMB.

Computers are already pretty powerful, please start and make the software smarter, not faster.

Is image manipulation thaat much better with the latest photoshop than with PS 5.5? With some minor exceptions the answer is clearly no.

Hah! I am forced to disagree in the strongest possible terms..

Speaking as a former production artist and current art director, the last couple of generations of graphics software have introduced powerful tools that streamline my workflow in ways I find it hard to even fathom. Ok, let's talk about Illustrator, for example. From 10 -> CS Adobe added in-application 3D rendering of any 2D artwork onto geometric primitives. This is something I used to either have to fake, or take out of the application and into a 3D renderer in order to render simple bottle/can/box packaging proofs. Marketing wants to make a copy change? Make the change to the 2D art and the 3D rendering is updated in real time. Oh, and the new version of InDesign recognizes that the art has been updated and reloads it into the brochure layout. Automatically.

This is just one feature out of literally hundreds. This one alone saves me an hour or two a day. Seriously, there are projects I can take on today that would have been unthinkable 5 years ago. Pre-press for a 700 page illustrated book project has gone from a week of painful, tedious work down to 30 minutes, of which 20 is letting the PDF render. Seriously.

Here's the thing, unless you use a piece of software all day, every day, you're really not in any position to comment on how much it has or hasn't changed.

Photoshop (et. al.) are software for professionals, despite the number of dilettantes out there using them for sprucing up their MySpace page.

The big question is how these processors interconnect. Cached shared memory probably won't scale up that high. An SGI study years ago indicated that 20 CPUs was roughly the upper limit before
the cache synchronization load became the bottleneck. That number changes somewhat with the hardware technology, but a workable 80-way shared-memory machine seems unlikely.

There are many alternatives to shared memory, and most of them, historically, are duds. The usual idea is to provide some kind of memory copy function between processors. The IBM Cell is the latest incarnation of this idea, but it has a long and disappointing history, going back to the nCube, the BBN Butterfly, and even the ILLIAC IV from the 1960s. Most of these, including the Cell, suffered from not having enough memory per processor.

Historically, shared-memory multiprocessors work, and loosely coupled network based clusters work. But nothing in between has ever been notably successful.

One big problem has typically been that the protection hardware in non-shared-memory multiprocessors hasn't been well worked out. The Infiniband people are starting to think about this. They have a system for setting up one way queues between machines in such a way that appliations can queue data for another machine without going through the OS, yet while retaining memory protection. That's a good idea. It's not well integrated into the CPU architecture, because it's an add-on as an I/O device. But it's a start.

You need two basic facilities in a non-shared memory multiprocessor - the ability to make a synchronous call (like a subroutine call) to another processor, and the ability to queue bulk data in a one-way fashion. (Yes, you can build one from the other, but there's a major performance hit if you do. You need good support for both.) These are the same facilities one has for interprocess communication in operating systems that support it well. (QNX probably leads in this; Minix 3 could get there. If you have to implement this, look at how QNX does it, and learn why it was so slow in Mach.)

They are claiming a terabyte per second interconnect. I think it is safe to assume it will be something like an Infiniband, myrinet or similar (NEC's IXS, IBM's HPS) high performance application networking technology.What you're asking for is pretty standard stuff in the high end, where hundreds of processors is quite common. Cache coherency is a killer, and so they have died out long ago in the high end. when you think about it, CC basically requires a crossbar switch style memory archictecture which ex

um... I guess it ain't clear then... the parent post was saying that you need OS support for accelerated remote procedure calls and one-sided communications. However, 1 sided communications already is in standard use by folks using hundreds of processors through an alreadystandardized library: MPI - Message Passing interface. Rather than the OS needing to define a new API, the folks creating high speed interconnects just create optimized libraries (in order to sell their hardware). Folks writing codes f

Modularity. Do you know how huge and complicated a system with 40 dual core CPUs would be? There's always use for massively parallel processing and news like this is a godsend for data centers, which are currently suffering from power/heat limitations (and expense).Just a few years ago, having a four-processor system meant getting a big motherboard or a custom system with multiple motherboards/processor cards. In 2007 you'll be able to put a quad-core CPU in a dinky mATX HTPC motherboard. In 2012 probably 8

Just as Gates couldn't imagine what anyone would want with more memory than 640KB, we can't imagine what people will do with 80 cores. I'm confident in predicting that they'll find ways to use every bit of that capacity and demand more.

ust as Gates couldn't imagine what anyone would want with more memory than 640KB, we can't imagine what people will do with 80 cores. I'm confident in predicting that they'll find ways to use every bit of that capacity and demand more.

He wasn't asking what 80 cores are good for. He's asking why we need 80 cores on one chip. As opposed to 40 dual-core processors, for example. And it's a good question. I imagine that these 80 cores can communicate extremely fast between their nearest neighbors. This could

I could make a joke about windows Vista... but instead I will say that we will one day think nothing of having one core for each process that we're running and have a massively fast system (with solid state hard drives ; )

Also, technology can never develop too far, what if I want to set up my computer to talk to me like the one on the Enterprise (D)?... I do worry about the computers rebelling and trying to take over, though.

OK, IBM did get egg on their face for saying that the world only needed 5 computers, so it is dangerous to predict the future but 80 core chips seem absurd.

The costs to make use of 80 cores (you're going to need hugely complex chips and hugely complex memory buses) mean that these chips will be severe overkill for PCs and will be outside any typical user's price range. They're only going to be useful for a a few servers in very niche applications. If there's only demand for, say, 10,000 of these chips in the world then they're going to be extremely expensive.

I smell marketing horseshit. I think they're just saying this to get people to start thinking of multi-core options. Most people don't see the need for multi-core (even 2 core) systems. By saying you'll get 80 cores in 5 years makes people start thinking that they should start using 2 or 4 cores now.

I can imagine at least 10, 50 or 100 at every college in the country, more for technical schools. That right there is a million units.

Then throw in the scientific community, the numerical analysis community (aka banks & wall street), and anyone else who wants to get their hands on cheap teraflops and are used to proprietary operating systems.

Now multiple those figures times the number of countries in the world that have the same needs.

The costs to make use of 80 cores (you're going to need hugely complex chips and hugely complex memory buses) mean that these chips will be severe overkill for PCs and will be outside any typical user's price range.

Unless the complexity makes the manufacturing vastly expensive, rather than just the development, this won't be true: the more widely its sold, the less of a development premium there will be, because the development costs will be spread more widely.

The major limitation to the effectiveness multi-cores is somewhat described in Amdahl's Law.

Things like memory bandwidth are already constraining 2-core chips. The only way to effectively mitigate this is to make wider bus paths. That's relatively easy for 2 core chips, but to get any benefit from 80-core chips you're going to need 40x the memory bandwidth you have now. That means huge pin-outs, huge amounts of RAM, huge everything.

These are not going to be systems that every college department can afford.

Comparing where we are today to twelve years ago, and expecting the same or greater multiplier is absurd.

In the 80s and early 90s, most of the bus speed limitations were due to capacitance issues (ie. how fast can we switch a transistor and discharge the capcaitance). We can make things faster by reducing capacitance through various measures. Now memory buses and speed are now getting so fast that they're starting to get constrained by the speed of light etc so it is getting harder to find large multiplier improvements.

I think there is still a lot of room for new stuff, maybe twice or four times what we have now. The biggest impovements that can be made, however, are in power reduction etc.

Comparing where we are today to twelve years ago, and expecting the same or greater multiplier is absurd.

Er, I wasn't pointing to any particular multiplier. I was pointing out that, even if you are right that, when released, these would be prohibitively expensive for most purchasers, that history suggests that processors go from "prohibitively expensive for most user", to "common", to "you really need to upgrade that old piece of crap" pretty quickly.

By saying you'll get 80 cores in 5 years makes people start thinking that they should start using 2 or 4 cores now.

Do we have compilers optimized for this sort of architecture today?

I expect that lots of work has been done so that multiple instances of Oracle-RAC run properly on an E25K, but that seems like a fairly specific scenario. Does Intel have a C compiler that was designed for miiltiple CPU systems? What about GCC?

If all they did was increase clock speeds, we wouldn't need as many major advanceme

Video processing is why consumers will eventually want 80-core chips. Many video algorithms are extremely parallelizable. Heck, modern video cards have double-digit numbers of shader units already, and consumers buy them. Generating video images in real-time is extremely parallelizable. Software rasterizers could easily use 80 cores. More excitingly, real-time raytracing would be feasible with 80 cores; no video card required. HD videos tax modern single cores just when being decoded, and encoding is

No they don't. Right now I'm building a Linux kernel and it is only using approx 35% of the CPU. Why? Because my memory and disk are not fast enough. If I swapped out the CPU and kept everything else the same, it would not go much faster. Sure, with a faster motherboard etc I could get better speed, but that is very difficult to scale to 80 cores

As I said before.... to get 80 cores working properly is going to require huge amounts of memory as well as hugely

I'm not sure this is 80 general-purpose processing cores: the article claims that there are "80 floating point cores". Clearly, the big selling points of the chip are, in Intel's view, its data transfer at 1 TB/sec, and its floating point speed at 1 TFLOP.

I can see uses for two, maybe 4 cores but what are advantages of 80 core chip as opposed to system with 40 2-core processors we can have now?

In fact the summary and the write up are very confusing or even slightly wrong. According to what I took from the keynote, the architecture is something similar clearspeed [clearspeed.com] which already has more than 80 parallel floating point cores.

A basic strategy would be for the OS to devote each process to its own processor.

This would reduce the need for TLB/cache flushes or eliminate context switches entirely. The whole machine would be really snappy.

That said, for a desktop machine, this is a huge amount of overkill, but with economies of scale being what they are, we'll probably have this power available soon.

What I'd like to see more though, is extra functionality in hardware rather than more of it. Wouldn't it be great if hardware was able to handle some of the things an OS is now used for, like memory (de)allocation? Or if we could tag memory according to type? Or if there were finer-grained controls than page-level?

So it's some sort of Pentium-ish beastie with 80 floating point units, not an 80 core CPU.

Although, I could easily find a use for either one. Just off the top of my head, an 80 FPU machine would be an excellent science/simulation machine. And it'd probably make some fairly decent graphics for games. You could use that much floating point for voice recognition with

What massively parallel tasks would possibly need 80 cores? I can see uses for two, maybe 4 cores but what are advantages of 80 core chip

Someone the other day told me that I don't do "serious work" because my application only has 10 or so threads - they claim to write server applications that run hundreds of threads. Such an application can easily make use of 80 cores. My app (running 10 threads) could benefit from 80 cores because it won't be the only app you are running.

Not quite. It says the number of transistors that can be put on an IC for a given amount of money doubled every 12-24 months (depending on when you asked Gordon). It's not just about transistor density, but also about the size of a die you can reliably make.

Of course, if you want to throw more money at a project, you can make much a bigger die. Look at the high-end Itaniums (Itanina?) for some examples.

Moore's law says nothing about speed, that is a common error. IIRC he made a general statement about the number of transistors that could be in a defined area doubling every 2 years and that was later changed to 18 months. It also had to do with cost of transistors I believe.

Isn't it kind of immature for a company to think "hmm, we have had relative sucess with dual core processors over our competitors. . so lets fit as many cores into a mobo as possible . ..that will get amd"

It might be, but its pretty immature of you to assume that that is what they are thinking.

While I believe that multi-core technology needs to be developed further, there are also other things for intel to be researching.

And...so? Does this announcement imply that Intel isn't researching other things?