If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

We heard there could be as many as 16 graphics cores packed into a single die.

That's a lot of cores.

How complex is a current GMA X3000 core? If you shrink down the proccess to CPU-size, how many could you pack into a current P4-sized, or maybe Core-Duo2, peice of silicon?

Using the X3000 core as a basis would get you 128 programmable pipelines in 16-way core. So that's probably wrong... (me assuming that they are going to use x3000 design fairly directly.(

32nm

I don't think so. 45nm is more likely, I figure.

The only thing I know about this sort of thing is that when you shrink the proccess of making a cpu down a step you basicly have to rebuild the entire assembly line. The whole plant. Also because at the same time you usually make the silicon wafer bigger to get higher yeilds per wafer.

So since Intel would have all this spare assembly line laying around then it would make sense to turn it into massive multicore gpu designs. You could be cheaper about it and cut more corners then you can with cpus also. If you have a flaw in the chip or in the silicon wafer then you just deactivate those chips that has the flaw... so a 'pure' core would be the high-end with all 16 GPUs, with 1/3 of the core goobered up then you have a mid-range video card with 12 cores, then with half or more of the cpu gone you have a 'low end' card with 6-8 cores.

So that way the video card fabrication proccess will always follow 1 generation behind the latest proccess used in the CPU. So it will probably be the size, power requirements, and expense of the current Core Duo 2 cpus if I am right.

These things range from 150 to about 700 dollars right now, just for the cpu. Of course the top of the line cpu is incredably overpriced. So I figure $350-500 with the entire card to start of with?

It's quite a competative advantage that Intel is going to have over Nvidia. Nvidia will have to build all new plants to move up to the next generation of fabrication.. while Intel can use the old stuff already bought and paid for by CPU sales and still be just as or more advanced.

But if that's the right video he talks alot about 7.2 and the future direction of X.org 7.3.

Also he gives a nice overview of him working with Intel hardware (mostly on how it relates to x.org 7.3 and suc). Also mentions Intel's intentions with Linux driver support.

They now do Linux driver development in-house with Keith's (and other hacker's) assistance.. Traditionally Linux driver development has lagged behind Windows. However it is now Intel's goal to ship working (and completely open source) Linux drivers the same day the corrisponding hardware ships.

This means, hopefully, that as soon as these things start showing up in stores you can just buy them and they will run on Linux with cdrom-supplied drivers.

Comment

How complex is a current GMA X3000 core? If you shrink down the proccess to CPU-size, how many could you pack into a current P4-sized, or maybe Core-Duo2, peice of silicon?

Probably about 16-ish. It's their most complicated chip attempted to date on the GPU front. It's been panned on the "review" sites because Intel shipped the design without full Windows side drivers. It's got a lot of promise and if the open source drivers are decent (which I'm hoping they are) it'd be a decent choice as a discrete part by itself.

Using the X3000 core as a basis would get you 128 programmable pipelines in 16-way core. So that's probably wrong... (me assuming that they are going to use x3000 design fairly directly.(

That would be about correct. Top end cards are running 32-ish right now (G80...). I'm not QUITE sure how someone's arriving at 16x the fastest cards out right now, but I could buy a 3x-4x advantage if they could pull off the management of resources, etc. on the multicore design with an X3000 based multicore- probably with less power consumption. Right now, this is all guesswork on our part- we've no idea what the pipelines can fully do yet in the X3000 or if they're even USING that core in the multicore design. They could have a 16/32 pipe core already in the pipeline as a rollout for the baseline discrete part for all we know.

If I am correct it sounds like it will make 'software rendering' faster then 'hardware rendering'. And then that means massive stability gains and get new features faster then anything else possible.

Just for people that don't know..

OpenGL is a general purpose program API for 3D GUI applications. It's not just for games, and although it's designed to work with hardware acceleration it's not nessicary to have hardware acceleration to successfully use OpenGL programs.

In fact consumer cards only accelerate a portion of the OpenGL API, just the stuff that is cpu intensive and tend to get used for games. One of the differences between 'workstation' class cards and 'consumer' or 'gamer' class cards is that they accelerate more of opengl, if my understanding is correct.

So with Linux DRI drivers are actually based off of the Mesa software OpenGL stack. The driver programmers use this software stack and then accelerate as much as that Mesa OpenGL stack as possible given their understanding of the hardware. For things that they don't accelerate then it does software fallback.

But if the GPU is x86-based then it's going to take relatively little effort to port Mesa over to run on this video card.

As in addition to effectively making 'semi-software rendering' faster then anything coming out of ATI or Nvidia you can port all sorts of extra stuff over to it very easily.

Media encoding, audio acceleration, physics acceleration, AI, scientific computing, etc etc. Most anything that can run on x86 that can benifit from massive parrellel proccessing.

[q]The Boeing 777 model contains roughly 350,000,000 (350 million) triangles, which arrived in a compressed (!) form on 12 CDs. The entire model to be rendered (including all triangles, BSP trees etc.), consumes roughly 30-60 GByte on disk. We render the full model, without any simplifications or approximations including pixel-accurate shadows and highlights.[/q]

[q]Currently, we use a single AMD Opteron 1.8GHz CPU. The machine is a dual-CPU. We currently get around 1-3 frames per second at 640x480 pixels on that setup, depending on the actual view. Some simple views run even faster, the 1-3 fps correspond to the images as shown above.[/q]

That's realtime performance. Sure it's only 1-3 FPS, but that's a 350 million triangles being rendered.

Could you imagine playing a game like GrandTheft Auto, but instead of only rendering the stuff that is close up and having simple models farther you get out until it just cuts off your view it renders the entire GTA world, realtime, with all the people with full detail by rendering the rays of your view rather then the models themselves and then clipping them?

This sort of stuff seems to me that it will revolutionise graphics for the PC. By moving to a software model rather then the hardware model it's going to make things a lot more flexible with a lot more performance and increased image quality.

Comment

This sort of stuff seems to me that it will revolutionise graphics for the PC. By moving to a software model rather then the hardware model it's going to make things a lot more flexible with a lot more performance and increased image quality.

What most people don't know is that any modern DirectX 9.0/OpenGL 2.X capable card is already doing software rendering- right now.

3D Graphics is stupidly SIMD. So's physics computations.

It's why you can do physics on a GPU along with rendering. It's why you're seeing AMD and NVidia fielding research project supercomputers in a single PC box that trash 32-64 box clusters on speed.

Drivers these days take requests for old functionality, translate the requests to GLSL or HLSL, which are simplistic versions of C, compile them and then intermix the resultant ALU code for the accelerator with compilations from modern API code- and then run the programs in turn as the applications ask for the code to be ran. It's why we're still a bit slower than we probably ought to be seeing with the X3000 benchmarks done recently on Phoronix- the open source crowd's still learning how to walk before really running. (One thing I'd like to know, though, on the X3000 benchmarks is whether the full featureset was turned on by default on the X3000 (There's features of the DRI drivers that is off by default right at the moment on at least some of the drivers- things like hardware TCL, etc...)). On paper, the X3000 should be a slightly better performer than it's showing to be right at the moment (though it IS doing well all the same...).

So, in reality, this isn't too far-fetched and has been something brewing for a long while now. Now, what remains to be seen is if this is PR spin from Intel, or the real deal- and if it's the real deal, can they deliver on the potential promise AND keep the critical programming details open. It's a good chance that they're going to use Open Source as an edge here as they try to lever themselves into this space- but it's NOT a foregone conclusion. We all know where those rumors of AMD releasing enough information to allow open source drivers for the R300-R500 chipsets have went- NOWHERE.

Comment

I know one of the features of the Cell proccessor, which is something that Sony left out of their PS3 design, is that they have basicly way to have a Cell to Cell buss networking.

And it's even external. Theoreticly you have a optical bus were you connect two or more machine's cells together. I don't know if that is what they do for the Cell clusters or not for scientific computing, not sure.

The lack of this for something like x86-based Beowolf clusters is of course why clusters, even though the majority of today's supercomputers are clusters, very limited in certain types of high performance computing tasks.

If you had a very low latency connection then it can make distributed system memory scemes work for clusters so you have something like multiple Linux kernels running on nodes with a single system root with single addressable memory space and be fully multithreaded, like a SMP machine. (see Kerrighed.org for a current attempt to do this with Linux with Beowolf-style clustering and with normal networking stuff)

It would be interesting if Intel came out for something like that for x86 systems. Imagine a rack of machines with 2 of these Intel video cards with dual sockets and very low latency, high bandwidth interconnects. They'd give Cell-based super computers a run for their money.

Of course it would be nice to be able to increase the 3D performance of your laptop by simply using a special plugin to connect it to your desktop also.

Although Intel could be betting that having a cluster traditional of machines with 3-4 PCIe 16x (or 32x or whatever) CPU/GPU 'daughter cards' wouldn't require something very special like a cell-to-cell buss to compete with IBM for high end computing. (and thus take advantage of price differential for more 'purely' commodity hardware.)

Who knows. I guess it's all fantasy at this point. But it's promising for Linux that it's currently the platform of choice for this sort of thing. Openness is certainly a plus for anybody wanting to get more traction and with the failure of Itanium vs POWER I'd expect that Intel is looking forward towards that.

One thing I'd like to know, though, on the X3000 benchmarks is whether the full featureset was turned on by default on the X3000 (There's features of the DRI drivers that is off by default right at the moment on at least some of the drivers- things like hardware TCL, etc...)). On paper, the X3000 should be a slightly better performer than it's showing to be right at the moment (though it IS doing well all the same...).

I beleive the chipset that they are testing in these benchmarks are actually the 3000 and not the X3000. The difference is that with the Q945 chipset (GMA 3000) you do not actually have the hardware T&L and such things you'd have with the G945.

From my personal experiance with my Intel Pentium-D with GMA 950 I can tell you that the MESA folks are making very good progress.

I am using Debian Unstable right now, which uses the X.org 7.1 release and that supports the GMA 950 out of the box.

However one of my favorite games is True Combat:Elite full modification for Enemy-Territory. It requires more resources to run then regular ET as it has higher graphics (with some added eye candy such as 'HD' lighting), better sound, and more detailed textures.

With the stock drivers it was unplayable (well at least to the point were it wasn't fun). Even with compiling updated DRI drivers and trying all different sorts of tweaks it didn't realy work out that wonderfully, although it made Nexuiz mostly playable.

So with Mesa the 915 driver is a sort of guinea pig. They have this modesetting branch, for instance, which I don't think the newer cards have even.

Well they have a newer DRI driver they call the 915tex driver. You compile it along side of the regular 915 dri driver and the X.org drv driver chooses which one to use.

The thing that is special about this driver is that it now had the new texture memory management support being worked on with Mesa and such. With this it is able to dynamicly able to allocate ram as you need it, which is terrific.

So I have to compile git versions of xf86-video-intel, drm, mesa, and a special linux-agp-compat so that I can get the same agpgart support that you get witht the latest -mm kernels.

So that represents about the bleeding edge of open source Linux 3d driver development. Using that I am now able to play TC:E quite well. It's not perfect, but using 16bit graphic setting in the xorg.conf along with 'fastest' graphical settings at 800x600 I am able to be quite competative. The only graphical lag I get is scenes with lots of smoke or were there is a lot of detail and distance going one and even then it's not so bad that I can't still shoot people in the head. :P

And the nice thing also is that all the tweaking and playing around that I had to do before actually reduces performance. These seem to me fairly well optimized drivers.

Unfortunately even though it's very stable during gameplay (unless your playing warsow which causes lockups) I can't successfully pull off a complete ET railgun benchmark yet to see how it compares with this article's benchmarks.

I think that once the memory management stuff gets sorted out with Mesa then you will have decent engough performance out of the x3000 for most Linux games, maybe even including Doom3 (with all the details turned down). (although I realy doubt that you'd be able to get away with playing online with Quake4)

I am hoping that they'd should have the memory management stuff stable by 7.3 release, especially this this would increase the usability (reduce memory usage, for example) of things like Beryl or Compiz to the point were it can be used by normal people with no sweat.

Comment

I know one of the features of the Cell proccessor, which is something that Sony left out of their PS3 design, is that they have basicly way to have a Cell to Cell buss networking.

Not needed in the PS3- but in Mercury's Cell BBE cluster box, it'd be a different story, I'd suspect...

I beleive the chipset that they are testing in these benchmarks are actually the 3000 and not the X3000. The difference is that with the Q945 chipset (GMA 3000) you do not actually have the hardware T&L and such things you'd have with the G945.

This makes the results even more interesting and impressive. I've an R200 based setup with a P4 mobile that handles quite a few games that bring my Xpress 200M based main laptop to it's knees and I just can't play them. But, it only can do that if I use the DRI tuning panel to dial up hardware T&L and so forth... We probably ought to get numbers for a G965 motherboard, then...

From my personal experiance with my Intel Pentium-D with GMA 950 I can tell you that the MESA folks are making very good progress.

< * background details of the situation deleted for brevity... * >

I think that once the memory management stuff gets sorted out with Mesa then you will have decent engough performance out of the x3000 for most Linux games, maybe even including Doom3 (with all the details turned down). (although I realy doubt that you'd be able to get away with playing online with Quake4)

I am hoping that they'd should have the memory management stuff stable by 7.3 release, especially this this would increase the usability (reduce memory usage, for example) of things like Beryl or Compiz to the point were it can be used by normal people with no sweat.

Interesting. On paper, the X3000 is actually a decent GPU. On the specs front, the thing weighs in at around an AMD X600-X800 class part, if the specs match what Intel's fielding for the GPU. If so, and the DRI drivers are effectively using it, you should be able to play even Quake4, with the settings dialed back a bit. Of course, it's all rampant supposition on my part at this point, not having one in hand, taking Intel at their word (That'd be a sure way to get burned! ), and comparing Intel's theoreticals to the same theoreticals from AMD on this one. So, our mileage on this one may vary quite a bit.

I only need to point to other chips offered by other players in the market to prove on-paper versus delivered performance- and it's mainly due to poor drivers or a mismatch at the silicon level that precludes a good performing driver under a given OS.

Which brings us back to the Larabee discussion. If it's like they're describing, IF they pull it off, and IF they open the info like they have done with the 3000/X3000 cores, then we may have a winner here. Well, we might have a winner with the X3000- it's just not certain at this point. I just wish Intel hadn't rushed this one to market before the drivers arrived proper- it left the fanboy reviewers with room to pan the part before it really was ready for primetime.

Comment

I seen mentioned a few places Intel's chipset for their mobile stuff is going to be a bit different from their PC stuff.

GM965 will be using a updated core designed to support DirectX 10, which is supposedly called the X4000.

Looks like the next machine I am going to buy is a Santa Rosa "Centrino Pro" laptop.

One interesting side feature is going to be it's onboard Flash drive for Vista's readyboost/readydrive stuff. As well as EFI replacement for the BIOS and probably 802.11 a/b/g/n support. Should be interesting.

I don't know what youl'd use the flash drive for in Linux though. Maybe a swap for making hybernate faster?

Also I heard on servers that it's possible to get better performance for writes to disk by having a nice flash drive, sticking the ext3 journal it, and enabling full journalling.

I wonder how the scedualling works like that if that will allow the disk to sleep longer before commiting writes. I donno.

Comment

That is an UGLY solution. Brute force if there is such a thing. On the other hand, it's just crazy enough that it might work. Getting it to run with a good power/heat to speed ratio is something else. Not to mention doing it affordably. Intel obviously thinks they can, though, which may mean that Moore's law has taken over again. (Using scalar processors in a GPU. Huh!)

Comment

If Intel can come up with a real winner then I am all for it. Choice is better and it will be kind of fun to watch the other big boys (nVidia and ATI/AMD) squirm! Intel's got the cash and the talent to make some really great GPU's so this will be a very welcome development indeed. With open drivers, then that will be the kicker as well